Using special characters with Search - android

I'm using SearchView (or SearchManager?) to search the database for hits. It works fine, but the problem is if you search for words with special characters (č, ž, š - all supported by the keyboard), the search returns nothing, even though the word exists in the database.
For example: Word in database ("Računalnik"); search string ("Rač") - returns 0, search string ("Rac") - returns 0.
Is there a way to change search encoding, or to handle these searches some other way?

I think you have to use unicode for those special characters.

Related

How to detect and remove a unicode-sequence emoji symbol from inputConnection?

Let's say I have an edittext field and I have to implement "backspace" functionality on it.
Deleting a simple letter character is fine, it works:
Character.isLetter(inputConnection.getTextBeforeCursor(1, 0).toString()) {
inputConnection.deleteSurroundingText(1, 0);
}
The problem comes when the character is an emoji symbol.
Its length is expressed as 2 utf-16 chars, for an example:
Grinning face: 😀
Unicode codepoint: U+1F600
Java escape: \ud83d\ude00
In such a case, I would simply remove 2 chars.
However, there are cases where an emoji is formed by multiple codepoints, like:
Rainbow flag: 🏳️‍🌈
Unicode codepoint sequence: U+1F3F3 U+FE0F U+200D U+1F308
Java escape: \ud83c\udff3\ufe0f\u200d\ud83c\udf08
When I press backspace, only one java escaped char gets deleted, not whole emoji. For flag example, only this \udf08 last part would be deleted, presenting user with screwed up emoji symbol. Surrogate pair check doesn't get me out of the hole here, I would still have screwed up emoji.
How can I properly find out the correct amount of chars to remove, so I would delete 1 whole emoji when pressing backspace? (for the flag example, I would need to get the number 6, to remove it fully)

TalkBack - how to read string with literal punctuation?

My app needs to display numeric ranges in a TextView, such as "34-93". TalkBack is reading this as "thirty-four minus ninety-three". I want it to either read "thirty-four dash ninety-three" or "thirty-four to ninety-three" I've tried inserting spaces before and after the dash, as well as using both a hyphen and en-dash, to no avail. In general, though, I have little control over how this string is formatted.
It seems like it should read "dash" if there were some attribute to cause it to read punctuation literally, like accessibilitySpeechPunctuation in iOS.

Android EditText return Vietnamese wrong format

I'm doing a Android project and facing a problem with EditText when I type Vietnamese.
Example, when i type the word "thử" into EditText and get string from it.
String text = edittext.getText().toString()
It always returns a String object with 4 characters "t", "h", "ư" and the accent character.
But if i create a String object by code like:String text = "thử";. It only contains 3 characters "t", "h" and "ử". So they do not match when I compare them. I want the String object contain 3 characters, not 4 characters.
I also think about a way that loop through all characters to replace them manually. But Vietnamese has 12 vowels and 6 accents so that it makes me have to check 72 cases. I don't think it is a good way. Anyway to get proper text from EditText? Or any good way to replace the text manually?UPDATE:I have found why the EditText always return weird String. It is cause by the phone keyboard app. I am using LG Magna and using default keyboard app. The app always encodes seperately base vowels and accents everything i input. I have just installed another keyboard app, then it works like a charm.Now, I have to find a way to make sure that the text always returns properly from any keyboard app.
Android use UTF-8 codepage, so please be sure that you're typing your vietnamese symbols using those UTF-8 but not any kind of Windows-1258`

SQLite unicode slavic accented words Android

I'm trying to filter out accented words if user searches for them in local database. But I have problems, namely with slavic letters ČŠŽ. In my SQLite database I have a field "title" with value: "Želodček"
If I try to select LOWER(title) I always get back the same value "Želodček" whilst other words are correctly lower cased. Only if the word begins with ČŽŠ then it doesn't get lower cased. This only persists with words which have leading accented letters.
Database records
Stomach
Želodček
Uppercase with UPPER()
STOMACH
ŽELODčEK
Lowercase with LOWER()
stomach
Želodček
I've already tried setting localization with setLocale() with no luck. I also tried different collation like NOCASE, UNICODE, LOCALIZED but nothing worked. I'm wondering why when lower cased the first letter is not lower cased and when upper cased other accented words are lowercase.
I've solved the problem with LIKE searches where I replace accented words with their lower cased counterpart. But I have problem with full text(FTS3) searching because I can't use the same trick with MATCH.
-- works but it's a hack
SELECT title FROM articles WHERE REPLACE(LOWER(title),'Ž','ž') LIKE '%želodček%'
-- can't seem to get it work
SELECT title FROM articles WHERE title MATCH 'želodček' COLLATE NOCASE
Is there any solution to this or is there a bigger problem?
Update:
No optimal solution yet.
Un-optimal solution 1:
I decided to deal with the problem directly by changing data in the select query. While this doesn't work for all cases (and I would have to cover all accents) it suits my case for now. So I'm posting it:
-- LIKE query
SELECT title FROM articles WHERE (REPLACE(REPLACE(REPLACE(LOWER(title),'Č','č'),'Š','š'),'Ž','ž') LIKE ? COLLATE NOCASE))
-- MATCH query (FTS)
-- In this case I programmatically replace searched word with 2 word variation (one that starts with lowercase and one that starts with uppercase) ie: title='želodček OR Želodček'
SELECT title FROM articles WHERE title MATCH ? COLLATE UNICODE
Un-optimal solution 2:
As suggested by user CL. to insert in normalized form (didn't work for me because normalized form was basically the original unicode form). I took it futher and insert title stripped of of accents (basically ASCII form). This is maybe better than solution one in ways of general solution. Since I only cover some accents in the first.
But there are downsides:
data doubles (one unicode title and one ASCII title). Which can be a problem if you have a lot of data.
some characters are not supported (like chinese characters will be gone after normalization and stripping)
ambiguity which you get by stripping accents (ie. two words "zelo" and "želo" have different meanings but will both turn up when searching).
Here's the Java code for it:
// Gets you the ASCII version of unicode title which you insert into different column
String titleAsciiName = Normalizer.normalize(title, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
LIKE never uses a custom collation.
FTS can use a custom tokenizer, but you have to check whether unicode61 is available in all Android versions you want to support.
The Android database API does not allow to create custom implementations of LIKE or of a FTS tokenizer.
You might want to store a normalized version of your strings in the database.

is it possible to store a character as underlined within a sqlite database?

I am trying to create a database for an android app including, in part, non-English words which require underlines and accents for proper spelling. I set my encoding for this package to utf-8, which allowed the accented characters to store and display properly. However, I cannot seem to get a single character underlined. It displays an empty box for an unrecognized character.
An example of my database helper to create the sqlite is as follows:
cv.put(ENGLISH, "to be alive");
cv.put(NATIVE, "okch_á_a or okchaha");
cv.put(PART_OF_SPEECH, "verb");
cv.put(AUDIO, "alive");
cv.put(VIDEO, "none");
cv.put(IMAGE_DEFAULT, "none");
cv.put(IMAGE_OPTIONAL, "none");
cv.put(IMAGE_TO_USE, "none");
db.insert("words", ENGLISH, cv);
That
_ a _
is the best I can come up with so far, but the a should actually be an underlined character.
I tried html tags like u and /u:
<u>a</u>
since that works with string arrays, but it displays as:
<u>a</u>
(the html is never interpreted).
I tried using:
"\u0332"
as explained at http://www.fileformat.info/info/unicode/char/332/index.htm , but that, too, is never interpreted, so it displays as:
a\u0332
I also tried:
& # 818 ;
and:
& # x332 ;
in a similar manner, with similar lack of results.
Any ideas?
You can store your string in Html format and call .setText(Html.fromHtml(somestring)) from the textview were you want to display it.

Categories

Resources