I'm doing a Android project and facing a problem with EditText when I type Vietnamese.
Example, when i type the word "thα»" into EditText and get string from it.
String text = edittext.getText().toString()
It always returns a String object with 4 characters "t", "h", "Ζ°" and the accent character.
But if i create a String object by code like:String text = "thα»";. It only contains 3 characters "t", "h" and "α»". So they do not match when I compare them. I want the String object contain 3 characters, not 4 characters.
I also think about a way that loop through all characters to replace them manually. But Vietnamese has 12 vowels and 6 accents so that it makes me have to check 72 cases. I don't think it is a good way. Anyway to get proper text from EditText? Or any good way to replace the text manually?UPDATE:I have found why the EditText always return weird String. It is cause by the phone keyboard app. I am using LG Magna and using default keyboard app. The app always encodes seperately base vowels and accents everything i input. I have just installed another keyboard app, then it works like a charm.Now, I have to find a way to make sure that the text always returns properly from any keyboard app.
Android use UTF-8 codepage, so please be sure that you're typing your vietnamese symbols using those UTF-8 but not any kind of Windows-1258`
Related
I've been trying to find a good way to be able to keep only emojis and letters in a given text, but every article I found, I didn't have success with .
I've tried to use regex, but seems that I can not make it work.
I've tried to use emoji4j but it seems that this library is working with emojis in this form ":)", which don't help me, because my emojis are groups of unicode characters.
The result I want is the following :
"This is. a text π¨βπ©βπ§βπ¦,,1234" => "This is a text π¨βπ©βπ§βπ¦"
"π¨βπ©βπ§βπ¦" => "π¨βπ©βπ§βπ¦"
"π¨βπ©βπ§βπ¦π123abcπ¨βπ©βπ§βπ¦" => "π¨βπ©βπ§βπ¦πabcπ¨βπ©βπ§βπ¦"
Here's the emoji regex : ?:[\u2700-\u27bf]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?(?:\u200d(?:[^\ud800-\udfff]|(?:[\ud83c\udde6-\ud83c\uddff]){2}|[\ud800\udc00-\uDBFF\uDFFF]|[\u2600-\u26FF])[\ufe0e\ufe0f]?(?:[\u0300-\u036f\ufe20-\ufe23\u20d0-\u20f0]|[\ud83c\udffb-\ud83c\udfff])?)*|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|[\ud83c\udd70-\ud83c\udd71]|[\ud83c\udd7e-\ud83c\udd7f]|\ud83c\udd8e|[\ud83c\udd91-\ud83c\udd9a]|[\ud83c\udde6-\ud83c\uddff]|[\ud83c\ude01-\ud83c\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c\ude32-\ud83c\ude3a]|[\ud83c\ude50-\ud83c\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff] .
If I try something like :
val regex = "the_whole_regex_above | [^a-zA-Z]".toRegex()
myText.replace(regex,""), it won't replace anything, basically every character will pass
Basically I want to achieve pretty much the same thing as in this question, but using Kotlin.
You want to remove all punctuation, symbols (other than those used to form emojis) and digits.
To do that, you may use
myText = myText.replace("""[\p{N}\p{P}\p{S}&&[^\p{So}]]+""".toRegex(), "")
See the online Kotlin demo.
Details
[ - start of a character class that matches:
\p{N} - any Unicode digit
\p{P} - any Unicode punctuation proper
\p{S} - any Unicode symbol
&&[^\p{So}] - BUT the Unicode symbols belonging to Symbol, other Unicode category that are mostly used to form emojis
]+ - 1 or more occurrences.
My app needs to display numeric ranges in a TextView, such as "34-93". TalkBack is reading this as "thirty-four minus ninety-three". I want it to either read "thirty-four dash ninety-three" or "thirty-four to ninety-three" I've tried inserting spaces before and after the dash, as well as using both a hyphen and en-dash, to no avail. In general, though, I have little control over how this string is formatted.
It seems like it should read "dash" if there were some attribute to cause it to read punctuation literally, like accessibilitySpeechPunctuation in iOS.
I am working on an Android Application in which one I would like to compare some string values in EditText.
For example, in a first EditText, I start to entry "dav" and then select "David" from the keyboard suggestions. In a second EditText, I start to entry "dav", then select "David" from the keyboard suggestions and then correct the content to "Dav".
Every seems to be OK. If I retrieve the content of the EditText (with getEditableText().toString().trim()) the debugger tells me that "David" is a word composed by 5 characters and "Dav" a word composed by 3 characters.
If now I click on the EditText that contains "Dav" and I select "David" from the keyboard suggestions, the debugger tells me that the word "David" is composed by 6 characters. The last character is "\u200B".
Why this character is automatically add and how can I remove it in a generic way ?
Thank you for your help.
\u200B is a unicode character zero width space. It seems to me it's being added by the keyboard you are using. I assume if you change your keyboard it's possible you won't see that behavior.
One way to handle that is replacing that character and dealing with the actual String:
#Test
public void zero_space_character() {
String David = "David\u200B";
String theRealDavid = David.replace("\u200B", "");
assertNotEquals(David, theRealDavid);
assertEquals("David", theRealDavid);
}
It should be getText(). toString(). trim().
I discovered today that Android can't display a small handful of Japanese characters that I'm using in my Japanese-English dictionary app.
The problem comes when I attempt to display the character via TextView.setText(). All of the characters below show up as blank when I attempt to display them in a TextView. It doesn't appear to be an issue with encoding, though - I'm storing the characters in a SQLite database and have verified that Android can understand the characters. Casting the characters to (int) retrieves proper Unicode decimal escapes for all but one of the characters:
String component = cursor.getString(cursor.getColumnIndex("component"));
Log.i("CursorAdapterGridComponents", "Character Code: " + (int) component.charAt(0) + "(" + component + ")");
I had to use Character.codePointAt() to get the decimal escape for the one problematic character:
int codePoint = Character.codePointAt(component, 0);
I don't think I'm doing anything wrong, and as String's are by default UTF-16 encoded, there should be nothing preventing them from displaying the characters.
Below are all of the decimal escapes for the seven problematic characters:
βΊ
Character Code: 11909(βΊ
)
βΊ Character Code: 11916(βΊ)
βΊΎ Character Code: 11966(βΊΎ)
β» Character Code: 11983(β»)
β» Character Code: 11990(β»)
βΊΉ Character Code: 11961(βΊΉ)
π ’ Character Code: 131490(π ’)
Plugging the first six values into http://unicode-table.com/en/ revealed their corresponding Unicode numbers, so I have no doubt that they're valid UTF-8 characters.
The seventh character could only be retrieved from a table of UTF-16 characters: http://www.fileformat.info/info/unicode/char/201a2/browsertest.htm. I could not use its 5-character Unicode number in setText() (as in "\u201a2") because, as I discovered earlier today, Android has no support for Unicode strings past 0xFFFF. As a result, the string was evaluated as "\u201a" + "2". That still doesn't explain why the first six characters won't show up.
What are my options at this point? My first instinct is to just make graphics out of the problematic characters, but Android's highly variable DPI environment makes this a challenging proposition. Is using another font in my app an option? Aside from that, I really have no idea how to proceed.
Is using another font in my app an option?
Sure. Find a font that you are licensed to distribute with your app and has these characters. Package the font in your assets/ directory. Create a Typeface object for that font face. Apply that font to necessary widgets using setTypeface() on TextView.
Here is a sample application demonstrating applying a custom font to a TextView.
I am trying to create a database for an android app including, in part, non-English words which require underlines and accents for proper spelling. I set my encoding for this package to utf-8, which allowed the accented characters to store and display properly. However, I cannot seem to get a single character underlined. It displays an empty box for an unrecognized character.
An example of my database helper to create the sqlite is as follows:
cv.put(ENGLISH, "to be alive");
cv.put(NATIVE, "okch_Γ‘_a or okchaha");
cv.put(PART_OF_SPEECH, "verb");
cv.put(AUDIO, "alive");
cv.put(VIDEO, "none");
cv.put(IMAGE_DEFAULT, "none");
cv.put(IMAGE_OPTIONAL, "none");
cv.put(IMAGE_TO_USE, "none");
db.insert("words", ENGLISH, cv);
That
_ a _
is the best I can come up with so far, but the a should actually be an underlined character.
I tried html tags like u and /u:
<u>a</u>
since that works with string arrays, but it displays as:
<u>a</u>
(the html is never interpreted).
I tried using:
"\u0332"
as explained at http://www.fileformat.info/info/unicode/char/332/index.htm , but that, too, is never interpreted, so it displays as:
a\u0332
I also tried:
& # 818 ;
and:
& # x332 ;
in a similar manner, with similar lack of results.
Any ideas?
You can store your string in Html format and call .setText(Html.fromHtml(somestring)) from the textview were you want to display it.