Regex to remove all special characters except periods - android

I need help with creating a regex that removes all special characters, including commas, but not periods. What I have tried to do is escape all the characters, symbols and punctuation I do not want. It is not working as intended.
replace("[-\\[\\]^/,'*:.!><~##\$%+=?|\"\\\\()]+".toRegex(), "")
I removed the period and tested that too. It did not work.
replace("[-\\[\\]^/,'*:!><~##\$%+=?|\"\\\\()]+".toRegex(), "")
For example, lets take the String "if {cat.is} in a hat, then I eat green eggs and ham!".
I want the result
if {cat.is} in a hat then I eat green eggs and ham (comma and exclamation symbol removed)
Note: I want to keep brackets, although braces are OK to omit.
Anyone have a solution for this?

You can use
"""[\p{P}\p{S}&&[^.]]+""".toRegex()
The [\p{P}\p{S}&&[^.]]+ pattern matches one or more (+) punctuation proper (\p{P}) or symbol (\p{S}) chars other than dots (&&[^.], using character class subtraction).
See a Kotlin demo:
println("a-b)h.".replace("""[\p{P}\p{S}&&[^.]]+""".toRegex(), ""))
// => abh.

Related

Kotlin Android allow only emojis and letters in a text

I've been trying to find a good way to be able to keep only emojis and letters in a given text, but every article I found, I didn't have success with .
I've tried to use regex, but seems that I can not make it work.
I've tried to use emoji4j but it seems that this library is working with emojis in this form ":)", which don't help me, because my emojis are groups of unicode characters.
The result I want is the following :
"This is. a text ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ,,1234" => "This is a text ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ"
"๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" => "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ"
"๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ˜ƒ123abc๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ" => "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿ˜ƒabc๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ"
Here's the emoji regex : ?:[\u2700-\u27bf]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?(?:\u200d(?:[^\ud800-\udfff]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?)*|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|[\ud83c\udd70-\ud83c\udd71]|[\ud83c\udd7e-\ud83c\udd7f]|\ud83c\udd8e|[\ud83c\udd91-\ud83c\udd9a]|[\ud83c\udde6-\ud83c\uddff]|[\ud83c\ude01-\ud83c\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c\ude32-\ud83c\ude3a]|[\ud83c\ude50-\ud83c\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff] .
If I try something like :
val regex = "the_whole_regex_above | [^a-zA-Z]".toRegex()
myText.replace(regex,""), it won't replace anything, basically every character will pass
Basically I want to achieve pretty much the same thing as in this question, but using Kotlin.
You want to remove all punctuation, symbols (other than those used to form emojis) and digits.
To do that, you may use
myText = myText.replace("""[\p{N}\p{P}\p{S}&&[^\p{So}]]+""".toRegex(), "")
See the online Kotlin demo.
Details
[ - start of a character class that matches:
\p{N} - any Unicode digit
\p{P} - any Unicode punctuation proper
\p{S} - any Unicode symbol
&&[^\p{So}] - BUT the Unicode symbols belonging to Symbol, other Unicode category that are mostly used to form emojis
]+ - 1 or more occurrences.

How to detect and remove a unicode-sequence emoji symbol from inputConnection?

Let's say I have an edittext field and I have to implement "backspace" functionality on it.
Deleting a simple letter character is fine, it works:
Character.isLetter(inputConnection.getTextBeforeCursor(1, 0).toString()) {
inputConnection.deleteSurroundingText(1, 0);
}
The problem comes when the character is an emoji symbol.
Its length is expressed as 2 utf-16 chars, for an example:
Grinning face: ๐Ÿ˜€
Unicode codepoint: U+1F600
Java escape: \ud83d\ude00
In such a case, I would simply remove 2 chars.
However, there are cases where an emoji is formed by multiple codepoints, like:
Rainbow flag: ๐Ÿณ๏ธโ€๐ŸŒˆ
Unicode codepoint sequence: U+1F3F3 U+FE0F U+200D U+1F308
Java escape: \ud83c\udff3\ufe0f\u200d\ud83c\udf08
When I press backspace, only one java escaped char gets deleted, not whole emoji. For flag example, only this \udf08 last part would be deleted, presenting user with screwed up emoji symbol. Surrogate pair check doesn't get me out of the hole here, I would still have screwed up emoji.
How can I properly find out the correct amount of chars to remove, so I would delete 1 whole emoji when pressing backspace? (for the flag example, I would need to get the number 6, to remove it fully)

android Regex issue

I had an issue with this regex:
(\{(([^\p{Space}][^\p{Punct}])+)\})
The problem is in number of chars. If I typing even number of chars it's works, when odd - not. I was trying to replace '+' with '?' or '*', but result still the same. How can I fix this?
I expect from this regex to block such strings: {%,$ #fd}. And allow this:
{F} or {F242fFSf23}.
Currently, it matches a {, then 1 or more repetitions of 2 chars, a non-space and then a non-punctuation, and then a }, hence you cannot use 1 char in between {...}.
To fix that, you need to use both the character classes inside bracket expression:
\{[^\p{Punct}\p{Space}]+\}
or
\{[^\p{P}\p{S}\s]+\}
Details
\{ - a { char
[^\p{Punct}\p{Space}]+ - 1 or more repetitons (+) of any char that does not belong to the \p{Punct} (punctuation) or \p{Space} (whitespace) class.
\} - a }.
Note that if the contents between the braces can only include ASCII letters or digits (in regex, [A-Za-z0-9]+), you may even use a mere
\{[A-Za-z0-9]+\}
Disassembling your regex... the reason why it only accepts an even number in between is the following part:
([^\p{Space}][^\p{Punct}])+
This basically means: something which isn't a space, exactly 1 character and something which isn't a ~punct, exactly 1 character and this several times... so exactly 1 + exactly another 1 are exactly 2 characters... and this several times will always be even.
So what you probably rather want is the following:
[^\p{Space}\p{Punct}]+
for the part shown above... which will result in the following for your complete regex:
\{[^\p{Space}\p{Punct}]+}
that of course can be simplified even more. I leave that up to you.

Check if www page is in text android

I want to check if www page is in text. For example i have page address: www.taktik.com/trow and want check if text for www is in text.
I use Matcher mW = Pattern.compile("[a-zA-Z0-9_.+-]+.[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+\\/[a-zA-Z0-9-.]+").matcher(question); but I don't get any results. How can I check if text xxx.xxxx.xxx/xxx is in my String?
How can I check if text xxx.xxxx.xxx/xxx is in my String?
Fixing your regex, the pattern may look like
[a-zA-Z0-9_.+-]+\\.[a-zA-Z0-9-]+\\.[a-zA-Z0-9.-]+/[a-zA-Z0-9.-]+
Mind I escaped thr first dot and placed the hyphen at the end of the last two character classes (in yours, you have 9-. that creates a range that matches more than you'd want).
I tried to shorten the pattern a bit, but it's difficult since \w also matches Unicode characters in Android. Here is a possible regex:
(?i)[A-Z0-9_+-]+(?:\\.[A-Z0-9-]+){2}/[A-Z0-9-]+

Android Replace "..." with ellipsis character

Since AVD tools 16 I'm getting this warning:
Replace "..." with ellipsis character (..., โ€ฆ) ?
in my strings.xml
at this line
<string name="searching">Searching...</string>
How do I replace ...? Is it just literally โ€ฆ?
Could someone explain this encoding?
โ€ฆ is the unicode for "โ€ฆ" so just replace it. It's better to have it as one char/symbol than three dots.
To make thing short just put โ€ฆ in place ...
Link to XML character Entities List
Look at Unicode column of HTML for row named hellip
If you're using Eclipse then you can always do the following:
Right click on the warning
Select "Quick Fix" (shortcut is Ctrl + 1 by default)
Select "Replace with suggested characters"
This should replace your three dots with the proper Unicode character for ellipsis.
Just a note: The latest version of ADT (21.1) sometimes won't do the replace operation properly, but earlier versions had no problem doing this.
This is the character: โ€ฆ
The solution to your problem is:
Go to Window -> Preferences -> Android -> Lint Error Checking
And search for "ellipsis". Change the warning level to "Info" or "Ignore".
This answer is indirectly related to this question:
In my case textView1.setTextView("done&#8230"); was showing some box/chinese character. Later, I checked into fileformat.info for what the value represents and I found this is a Han character.
So, what to do? I searched for "fileformat.info ellipse character" and then everything became clear to me once I saw its values are;
UTF-16 (hex) 0x2026 (2026)
UTF-16 (decimal) 8,230
So, you have several encoding available to represent a character (e.g. 10 in Decimal is represented as A in hexa) so it is very important to know when you are writing an unicode character, how receiving function decodes it. If it decodes as decimal value then you have to provide decimal value, if it accept hexadecimal then you have to provide hexadecimal.
In my case, setTextView() function accepts decimal encoded value but I was providing hexadecimal values so I was getting wrong character.
The quick fix shortcut in Android Studio is Alt + Enter by default.
Best not to ignore it as suggested by some, it seems to me. Use Android Studio to correct it (rather than actually typing in the character code), and the tool will replace the three dots with the three-dot unicode character. Won't be confusing to translators etc.

Categories

Resources