Actually question was asked several times, but I didn't manage to find answer.
There's set of SQLite table(s) which are read-only - I can't change their structure or redefine collation rules. Tables consisting some international characters (Russian/Chinese, etc).
I would like to get some case-insensitive selection like:
select name from names_table where upper(name) glob "*"+constraint.toUpperCase()+"*"
It works only when name is latin/ASCII charset, for international chars it doesn't work.
SQLite's manual reads:
The upper(X) function returns a copy of input string X in which all
lower-case ASCII characters are converted to their upper-case
equivalent.
So the question is: how to resolve this issue and make international chars in upper/lower case?
This is known problem in sqlite. You can redefine built-in functions via Android NDK. This is not a simple way. Look at this question
Notice that indexes of your tables will not work (for UDF) and query can be very slow.
Instead of it you can store your data (which you look for) in other column in ascii format.
For example:
"insert into names_table (name, name_ind) values ('"+name+"',"+"'"+toAsciiEquivalent(name)+"')"
name name_ind
----------------
Имя imya
Name name
ыыы yyy
and search string by column name_ind
select name from names_table where name_ind glob "*"+toAsciiEquivalent(constraint)+"*"
This solution requires more space for data, but it is simple and fast.
Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads. Or, developers can write their own overloads based on their own Unicode-aware comparison routines already contained within their project.
Reference: http://www.sqlite.org/faq.html
Related
I try to list all the files under "/proc" on my android device,and get all those files' names only contain numbers,such as '123','435'.I try to filter those by regular expression.I tried three expressions below but all of these failed sometimes:
^[0-9]+$
[0-9]+
\d+
I wonder how can the three expressions can match such as "14971" but can't match "15003"?
I think boober Bunz is right, that the file extension is the difference.
All three of your expressions match both
"14971"
and
"15003"
the best way is to pull the extensions off the fileNames, and then use the most restrictive expression you need to: ^[0-9]+$
or if you want to just leave the extension on, this would most likely work for you:
"^[0-9]+[.][^.]*$"
start of string, one or more digits, must have a . and then any number of non . end of string. this would not match:
"123.123.txt"
I'm trying to filter out accented words if user searches for them in local database. But I have problems, namely with slavic letters ČŠŽ. In my SQLite database I have a field "title" with value: "Želodček"
If I try to select LOWER(title) I always get back the same value "Želodček" whilst other words are correctly lower cased. Only if the word begins with ČŽŠ then it doesn't get lower cased. This only persists with words which have leading accented letters.
Database records
Stomach
Želodček
Uppercase with UPPER()
STOMACH
ŽELODčEK
Lowercase with LOWER()
stomach
Želodček
I've already tried setting localization with setLocale() with no luck. I also tried different collation like NOCASE, UNICODE, LOCALIZED but nothing worked. I'm wondering why when lower cased the first letter is not lower cased and when upper cased other accented words are lowercase.
I've solved the problem with LIKE searches where I replace accented words with their lower cased counterpart. But I have problem with full text(FTS3) searching because I can't use the same trick with MATCH.
-- works but it's a hack
SELECT title FROM articles WHERE REPLACE(LOWER(title),'Ž','ž') LIKE '%želodček%'
-- can't seem to get it work
SELECT title FROM articles WHERE title MATCH 'želodček' COLLATE NOCASE
Is there any solution to this or is there a bigger problem?
Update:
No optimal solution yet.
Un-optimal solution 1:
I decided to deal with the problem directly by changing data in the select query. While this doesn't work for all cases (and I would have to cover all accents) it suits my case for now. So I'm posting it:
-- LIKE query
SELECT title FROM articles WHERE (REPLACE(REPLACE(REPLACE(LOWER(title),'Č','č'),'Š','š'),'Ž','ž') LIKE ? COLLATE NOCASE))
-- MATCH query (FTS)
-- In this case I programmatically replace searched word with 2 word variation (one that starts with lowercase and one that starts with uppercase) ie: title='želodček OR Želodček'
SELECT title FROM articles WHERE title MATCH ? COLLATE UNICODE
Un-optimal solution 2:
As suggested by user CL. to insert in normalized form (didn't work for me because normalized form was basically the original unicode form). I took it futher and insert title stripped of of accents (basically ASCII form). This is maybe better than solution one in ways of general solution. Since I only cover some accents in the first.
But there are downsides:
data doubles (one unicode title and one ASCII title). Which can be a problem if you have a lot of data.
some characters are not supported (like chinese characters will be gone after normalization and stripping)
ambiguity which you get by stripping accents (ie. two words "zelo" and "želo" have different meanings but will both turn up when searching).
Here's the Java code for it:
// Gets you the ASCII version of unicode title which you insert into different column
String titleAsciiName = Normalizer.normalize(title, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
LIKE never uses a custom collation.
FTS can use a custom tokenizer, but you have to check whether unicode61 is available in all Android versions you want to support.
The Android database API does not allow to create custom implementations of LIKE or of a FTS tokenizer.
You might want to store a normalized version of your strings in the database.
On Android Jellybean 4.1 is it possible to escape a LIKE wildcard character and still have the like use the index?
Use:
where field like 'xyz\_abc'
as opposed to:
where field like 'xyz_abc'
Does escaping wildcards work on Android? And will it still use the index if the wildcard is escaped?
What I am currently doing is:
where field like 'xyz_abc' and lower(field) = lower('xyz_abc')
Which is horribly inefficient due to the wildcard character.
Thanks
You need to use the ESCAPE clause:
where field like 'xyz\_abc' escape '\'
See the section The LIKE and GLOB operators in the SQLite Documentation.
Could somebody tell me what is better in terms of performance?
Is it better to save 2 strings at string.xml, like 'abc' and 'abc:'
Or should I save only the first one and concatenate ':' when needed at Java coding ???
Very difficult to answer depending on what your strings will represent and what you need to append. Localization is also an issue, for example...
Dog // English
Chien // French
Hund // German
Using string resources allows you to create different resource files depending on the locale of the device and Android will automatically use the right localized string resource file. If all you need to do is append a single character such as : then you'll double every string for every language.
If you choose to only save the basic strings and append the character using code, then the code will be universal and you'll simply need to append the character to whatever localized word - potentially a lot more efficient.
Both from storage perspective and performance you should save only "abc";
getting extra data from disk takes far longer as some quick in-memory actions.
storing the same data twice is bad practice in general
If you have to concatenate multiple strings you should use StringBuilder - http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuilder.html
It's much faster then using '+' or '.concat()'
How can I change the font on android to allow to show special characters like "'" or "à"?
Actually the strings that contains these characters are stored in the sqlite database.
When you load the text into your TextView, will this work for you?
textView.setText(new String(textFromDatabase, "UTF-8"));
This uses the String constructor to set the charset name. You can change "UTF-8" to a different Character encoding -- Also, look at the javadoc for String.
String(byte[] bytes, String charsetName) -
Constructs a new String by decoding the specified array of bytes using the specified charset.
The Droid font supports the "'", "à" and many others characters. I use them all the time (pt language).
Actually, I'm quite sure they support all the Basic Latin, Latin 1 supplement and the first extended latin range. They also support many others like hebrew etc., although I'm not sure if that changed between SDK versions.
You can also download the Unicode Map app in the Market to check which characters are available in your particular device. I also store unicode text in sqlite all the time, and still I don't have any problems.
One thing to consider: check that the encoding you are setting match the encoding of your source. It may be a text or a URL... an example:
BufferedReader b = new BufferedReader(new InputStreamReader(url.openStream(), MY_ENCODING));
Are you sure it's not a problem somewhere?
You should use '' instead of ' to store it into Sqlite database.
For example if you want to store 5 o'clock into database then you have to write this as 5 O''clock. Take a look here, for more information about it.
By default Android SQLite uses UTF-8.
I had this problem because when I populated the database on the first launch I used a txt file with another charset.