Filter ignoring accents android [duplicate] - android

I am new in Android and I'm working on a query in SQLite.
My problem is that when I use accent in strings e.g.
ÁÁÁ
ááá
ÀÀÀ
ààà
aaa
AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It's return:
AAA
aaa (It's ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It's return:
ààà (ignoring the title "ÀÀÀ")
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.

Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
Add another column to your table, for example MOVIE_NAME_ASCII
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
Do your text searches on this ASCII-normalized column but display data from the original unicode column.

In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.

You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.

You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

Related

ORMLITE Query for french text ignoring accent on Android [duplicate]

I am new in Android and I'm working on a query in SQLite.
My problem is that when I use accent in strings e.g.
ÁÁÁ
ááá
ÀÀÀ
ààà
aaa
AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It's return:
AAA
aaa (It's ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It's return:
ààà (ignoring the title "ÀÀÀ")
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.
Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
Add another column to your table, for example MOVIE_NAME_ASCII
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
Do your text searches on this ASCII-normalized column but display data from the original unicode column.
In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.
You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.
You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

android sqlite select like national characters independend [duplicate]

I am new in Android and I'm working on a query in SQLite.
My problem is that when I use accent in strings e.g.
ÁÁÁ
ááá
ÀÀÀ
ààà
aaa
AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It's return:
AAA
aaa (It's ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It's return:
ààà (ignoring the title "ÀÀÀ")
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.
Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
Add another column to your table, for example MOVIE_NAME_ASCII
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
Do your text searches on this ASCII-normalized column but display data from the original unicode column.
In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.
You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.
You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

How to ignore accent in SQLite query (Android)

I am new in Android and I'm working on a query in SQLite.
My problem is that when I use accent in strings e.g.
ÁÁÁ
ááá
ÀÀÀ
ààà
aaa
AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It's return:
AAA
aaa (It's ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It's return:
ààà (ignoring the title "ÀÀÀ")
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.
Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
Add another column to your table, for example MOVIE_NAME_ASCII
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
Do your text searches on this ASCII-normalized column but display data from the original unicode column.
In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.
You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.
You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

Accented Search in sqlite (android)

I have a column where some of the elements contain accented letters.
eg : Grambú
My requirement is that when I search for "Grambu" I should get "Grambú" in the results as well.
For this requirement I tried using "COLLATE NOCASE" parameter for that specific column.
But that didnt work.
When I searched for solutions in the web , I found many people suggesting normalizing the accented characters
and creating another column based on it as the only option.
Is there any other easier solutions to this problem?
COLLATE NOCASE works only for the 26 upper case characters of ASCII.
Set the database's locale to one that has accented character support using setLocale() and use COLLATE LOCALIZED.
You may also try using COLLATE UNICODE.
But beware of this bug: SQLite UNICODE sort broken in ICS - no longer case-insensitive.
Check the documentation for mention of these two collators in Android.
Also check out this online collation demo tool.
http://www.sqlite.org/lang_expr.html
(A bug: SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.)
In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE. However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, a user searching for either Grambu or Grambú will get the search converted into Gramb[uúùüû], returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.

What characters cannot be used for values in SQLite databases?

I'm making an Android app and I have used an SQLite database. But I found out if you type characters like single quotes ('), (also for using as the primary key) the data won't be saved/retrieved correctly.
Is it a problem with me or is it true? If its true are there any more characters like that?
Thanks.
#bdares and #mu Thanks for the tips, but can you please tell me how to use placeholders and/or prepared statements in SQLite?
I have always used direct String concatenation before but now, as it appears that's a bad practice, I would like to use prepared statements and/or placeholders.
Possibly you'll have problems with characters like ASCII STOP and such non-printing characters, but if you use prepared statements and parameter binding, you won't have any trouble even with characters like '.
If you don't want to use parameter binding and prepared statements, you can replace all of your input ' with \' and you'll be fine.
SQL typically uses ' as its special character to tell when a string literal starts or stops. If your input has this character, it will stop treating the current line as a string and start treating it as commands. This is not a good thing, security wise. It also keeps you from inputting that character unless you "escape" it by placing a backslash in front of it, which tells SQL to ignore the ' and continue treating the following characters as a string until an unescaped ' is met. Of course, backslash literals are also escaped as double-backslashes.
Prepared statements typically look like this:
String sql = "INSERT INTO MYTABLE (NAME, EMP_NO, DATE_HIRED) VALUES (?, ?, ?)";
PreparedStatement ps = sqlite.prepareStatement(sql);
ps.setString(1, myString);
ps.setInt(2, myInt);
ps.setDate(3, myDate);
ps.executeUpdate();
Unfortunately, I don't know exactly what library you'd be using to access sqlite from Android, so I can't give you more details at this time.
SQLite statements use quotes -- single or double -- for strings. If you need to INSERT a string with (') for example, you can use double quotes (") to wrap the string:
INSERT INTO my_table (some_column) VALUES("'a string'");
Or the other way around:
INSERT INTO my_table (some_column) VALUES('"a string"');
(Of course, you will need to escape any (") in your Java code.)
An alternative is to use a SQLiteStatment (Prepared statement) and bindString()
As for the "characters allowed", SQLite internally stores strings (type TEXT) as UTF-8 or UTF-16. Android's build uses the default of UTF-8. Therefor, you can store any string you like.
SQLite supports the data types TEXT (similar to String in Java), INTEGER (similar to long in Java) and REAL (similar to double in Java). All other types must be converted into on of these fields before saving them in the database. SQLight itself does not validate if the types written to the columns are actually of the defined type, you can write an integer into a string column.

Categories

Resources