How to detect and remove a unicode-sequence emoji symbol from inputConnection?

How to detect and remove a unicode-sequence emoji symbol from inputConnection? - android

Let's say I have an edittext field and I have to implement "backspace" functionality on it.
Deleting a simple letter character is fine, it works:
Character.isLetter(inputConnection.getTextBeforeCursor(1, 0).toString()) {
inputConnection.deleteSurroundingText(1, 0);
}
The problem comes when the character is an emoji symbol.
Its length is expressed as 2 utf-16 chars, for an example:
Grinning face: 😀
Unicode codepoint: U+1F600
Java escape: \ud83d\ude00
In such a case, I would simply remove 2 chars.
However, there are cases where an emoji is formed by multiple codepoints, like:
Rainbow flag: 🏳️‍🌈
Unicode codepoint sequence: U+1F3F3 U+FE0F U+200D U+1F308
Java escape: \ud83c\udff3\ufe0f\u200d\ud83c\udf08
When I press backspace, only one java escaped char gets deleted, not whole emoji. For flag example, only this \udf08 last part would be deleted, presenting user with screwed up emoji symbol. Surrogate pair check doesn't get me out of the hole here, I would still have screwed up emoji.
How can I properly find out the correct amount of chars to remove, so I would delete 1 whole emoji when pressing backspace? (for the flag example, I would need to get the number 6, to remove it fully)

Related

Regex to remove all special characters except periods

I need help with creating a regex that removes all special characters, including commas, but not periods. What I have tried to do is escape all the characters, symbols and punctuation I do not want. It is not working as intended.
replace("[-\\[\\]^/,'*:.!><~##\$%+=?|\"\\\\()]+".toRegex(), "")
I removed the period and tested that too. It did not work.
replace("[-\\[\\]^/,'*:!><~##\$%+=?|\"\\\\()]+".toRegex(), "")
For example, lets take the String "if {cat.is} in a hat, then I eat green eggs and ham!".
I want the result
if {cat.is} in a hat then I eat green eggs and ham (comma and exclamation symbol removed)
Note: I want to keep brackets, although braces are OK to omit.
Anyone have a solution for this?

You can use
"""[\p{P}\p{S}&&[^.]]+""".toRegex()
The [\p{P}\p{S}&&[^.]]+ pattern matches one or more (+) punctuation proper (\p{P}) or symbol (\p{S}) chars other than dots (&&[^.], using character class subtraction).
See a Kotlin demo:
println("a-b)h.".replace("""[\p{P}\p{S}&&[^.]]+""".toRegex(), ""))
// => abh.

TalkBack - how to read string with literal punctuation?

My app needs to display numeric ranges in a TextView, such as "34-93". TalkBack is reading this as "thirty-four minus ninety-three". I want it to either read "thirty-four dash ninety-three" or "thirty-four to ninety-three" I've tried inserting spaces before and after the dash, as well as using both a hyphen and en-dash, to no avail. In general, though, I have little control over how this string is formatted.
It seems like it should read "dash" if there were some attribute to cause it to read punctuation literally, like accessibilitySpeechPunctuation in iOS.

The char "\u200B" is automatically added to the EditText

I am working on an Android Application in which one I would like to compare some string values in EditText.
For example, in a first EditText, I start to entry "dav" and then select "David" from the keyboard suggestions. In a second EditText, I start to entry "dav", then select "David" from the keyboard suggestions and then correct the content to "Dav".
Every seems to be OK. If I retrieve the content of the EditText (with getEditableText().toString().trim()) the debugger tells me that "David" is a word composed by 5 characters and "Dav" a word composed by 3 characters.
If now I click on the EditText that contains "Dav" and I select "David" from the keyboard suggestions, the debugger tells me that the word "David" is composed by 6 characters. The last character is "\u200B".
Why this character is automatically add and how can I remove it in a generic way ?
Thank you for your help.

\u200B is a unicode character zero width space. It seems to me it's being added by the keyboard you are using. I assume if you change your keyboard it's possible you won't see that behavior.
One way to handle that is replacing that character and dealing with the actual String:
#Test
public void zero_space_character() {
String David = "David\u200B";
String theRealDavid = David.replace("\u200B", "");
assertNotEquals(David, theRealDavid);
assertEquals("David", theRealDavid);
}

It should be getText(). toString(). trim().

Error using native smiles of Android in text field (Unity3d)

I make application with Unity3d and build it for Android, when I write in input field android native smiles - I got error in line
(invalid utf-16 sequence at 1411555520 (missing surrogate tail)):
r.font.RequestCharactersInTexture(chars, size, style);
chars contains string than contains native android smiles. How I may support native smiles? I use own class for Input Field.

Unfortunately, supporting emojis with Unity is hard. When I implemented this feature, it took about a month to finish it, with a custom text layout engine and string class. So, if this requirement is not particularly important, I would suggest axing this feature.
The reason behind this particular error is that Unity gets characters from the input string one by one, and updates the visual string every character. From the layman point of view, this makes complete sense. However, it doesn't take into account how UTF-16 encoding, which is used in C#, works.
UTF-16 encoding uses 16 bits per a single unicode characters. It is enough for almost all characters that you would normally use. (And, as every developer knows, "almost all" is a red flag that will lay dormant for a long time and then will explode and destroy everything you love.) But it so happens, that Emoji characters are do not fit into 16 bit UTF-16 character, and use a special case — surrogate pair:
Surrogate pair is a pair of UTF-16 characters that represent a single Unicode character. That means that they don't have any meaning on their own individually, and when you try to render a UTF-16 character that is a surrogate head or surrogate tail, you can expect to get an error like this, or something similar.
Essentially, what you need to implement is some kind of buffer, that will accept C# UTF-16 characters one by one, and then pass them to rendering code when it verifies that all surrogate pairs are closed.
Oh, and I almost forgot! Some Emoji characters, like country flags, are represented by two unicode characters. Which means that they can potentially take up to four UTF-16 characters. Aren't text encodings fun?

What are my options for displaying characters that Android can't?

I discovered today that Android can't display a small handful of Japanese characters that I'm using in my Japanese-English dictionary app.
The problem comes when I attempt to display the character via TextView.setText(). All of the characters below show up as blank when I attempt to display them in a TextView. It doesn't appear to be an issue with encoding, though - I'm storing the characters in a SQLite database and have verified that Android can understand the characters. Casting the characters to (int) retrieves proper Unicode decimal escapes for all but one of the characters:
String component = cursor.getString(cursor.getColumnIndex("component"));
Log.i("CursorAdapterGridComponents", "Character Code: " + (int) component.charAt(0) + "(" + component + ")");
I had to use Character.codePointAt() to get the decimal escape for the one problematic character:
int codePoint = Character.codePointAt(component, 0);
I don't think I'm doing anything wrong, and as String's are by default UTF-16 encoded, there should be nothing preventing them from displaying the characters.
Below are all of the decimal escapes for the seven problematic characters:
⺅ Character Code: 11909(⺅)
⺌ Character Code: 11916(⺌)
⺾ Character Code: 11966(⺾)
⻏ Character Code: 11983(⻏)
⻖ Character Code: 11990(⻖)
⺹ Character Code: 11961(⺹)
𠆢 Character Code: 131490(𠆢)
Plugging the first six values into http://unicode-table.com/en/ revealed their corresponding Unicode numbers, so I have no doubt that they're valid UTF-8 characters.
The seventh character could only be retrieved from a table of UTF-16 characters: http://www.fileformat.info/info/unicode/char/201a2/browsertest.htm. I could not use its 5-character Unicode number in setText() (as in "\u201a2") because, as I discovered earlier today, Android has no support for Unicode strings past 0xFFFF. As a result, the string was evaluated as "\u201a" + "2". That still doesn't explain why the first six characters won't show up.
What are my options at this point? My first instinct is to just make graphics out of the problematic characters, but Android's highly variable DPI environment makes this a challenging proposition. Is using another font in my app an option? Aside from that, I really have no idea how to proceed.

Is using another font in my app an option?
Sure. Find a font that you are licensed to distribute with your app and has these characters. Package the font in your assets/ directory. Create a Typeface object for that font face. Apply that font to necessary widgets using setTypeface() on TextView.
Here is a sample application demonstrating applying a custom font to a TextView.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.