Custom tokenchars for apostrophes in SQLite / FTS3 / FTS4 / Android? - android

My Android app contains a SQLite table (Full Text Search / FTS3) which contains data such as:
Green's Product
In the app, when a user searches for Green's Product, the match is found. But when the user searches for Greens Product (i.e., without the apostrophe), there is no match.
I need the product to be returned regardless of whether the user types the apostrophe or not.
So I have being reading the documentation about FTS3 tokenizers which suggests that I need to specify the apostrophe in the tokencharsargument.
So I have tried re-creating my table like this:
CREATE VIRTUAL TABLE products USING fts3 (product_name TEXT, tokenize=simple "tokenchars='");
but there is still no match when a user searches for Greens Product.
I've also tried:
CREATE VIRTUAL TABLE products USING fts3 (product_name TEXT, tokenize=simple "tokenchars=''"); (double apostrophe)
and
CREATE VIRTUAL TABLE products USING fts4 (product_name TEXT, tokenize=simple "tokenchars='"); (fts4)
and
CREATE VIRTUAL TABLE products USING fts4 (product_name TEXT, tokenize=simple "tokenchars=''"); (double apostrophe and fts4)
but none of them seem to be working for me.
I'm not sure what to try next. And I'm also not sure if I may be coming up against some limitation here due to Android's implementation of SQLite.
Does anyone have any suggestions?

If you add the apostrophe to tokenchars, then it will become part of the word. This means that you must specifiy it to find it.
You actually want to ignore apostrophes, without breaking tokens. There appears to be no built-in option for this; you could either write a custom tokenizer, or just remove such characters before inserting/searching strings.

Related

Sqlite Query to MATCH Multiple Words in FTS4 Table Android

I am trying to do this query in android FTS4 table and this works perfectly:
SELECT * from table WHERE table MATCH 'description: paint* OR alias: paint*'
I need to match multiple words in multiple columns like this:
SELECT * from table WHERE table MATCH 'description: seal* AND paint* OR alias: seal* OR paint*'
This doesn't work in android but works in any DB browser.
I have tried many combinations such as below, they all work in the browser but not in android.
SELECT * from table WHERE table MATCH '(description: seal* AND paint*) OR (alias: seal* OR paint*)'
SELECT * from table WHERE table MATCH 'description: (seal* AND paint*) OR alias: (seal* OR paint*)'
The documentation of sqlite3 doesn't specify any solution for multiple columns with multiple words.
Also in this question the query works in the current android environment as mentioned above in my first line of code. Maybe it didn't work in the past but now it works. My problem is regarding multiple values with OR/AND as described not multiple columns.
Is there any way to achieve this thing in Android?

What is the advantage of FTS over custom solution?

I have a biggish database ~32mb which has lots of text in 4 languages. Including Arabic and Urdu. I need to search this text in the most efficient way (speed & size).
I am considering FTS, and trying to find out how to implement it. Right now I am reading http://www.sqlite.org/fts3.html#section_1_2 about it.
It seems to me, an FTS table is just like a normal table used to index all the different words. So my questions are:
1) If to populate FTS I have to do all the inserts myself, then why not make my own indexed word table, what is the difference?
Answer : Yes there are many advantages, many built in functions that help. For example with ranking etc, searching of stems and the transparent nature of how it all works in android makes the FTS approach more appealing.
2) On the google docs I read its a virtual in memory table, now this would be massive right... but it doesnt mention this on the SQLite website. So which is it?
3) Is there an easy way to generate all the different words from my columns?
4) Will the FTS handle arabic words properly?
FTS allows for fast searching of words; normal indexes only allow to search for entire values or for the beginning of the value.
If you table has only one word in each field, using FTS does not make sense.
FTS is a virtual table, but not an in-memory table.
You can get individual terms from the full-text index with the fts4aux table.
The default tokenizer works only with ASCII text.
You have to test whether the ICU or UNICODE61 tokenizers work with your data.
1) If to populate FTS I have to do all the inserts myself, then why
not make my own indexed word table, what is the difference?
Using your own indexed word table, you would have parse words in sentences. You would then need a table for sentences and another to words. And you should do this efficiently.
2) On the google docs I read its a virtual in memory table, now this
would be massive right... but it doesnt mention this on the SQLite
website. So which is it?
Don't understand your question. Data is handled via virtual table extension, however back storage is done in database (FTS4 creates 5 tables for each virtual table). Check this:
sqlite> CREATE VIRTUAL TABLE docs USING fts4();
sqlite> .schema
CREATE VIRTUAL TABLE docs USING fts4();
CREATE TABLE 'docs_content'(docid INTEGER PRIMARY KEY, 'content');
CREATE TABLE 'docs_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE 'docs_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_
end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE TABLE 'docs_docsize'(docid INTEGER PRIMARY KEY, size BLOB);
CREATE TABLE 'docs_stat'(id INTEGER PRIMARY KEY, value BLOB);
sqlite>
3) Is there an easy way to generate all the different words from my
columns?
For sure. But that's not easy. That's what FTS does.
4) Will the FTS handle arabic words properly?
I'm not sure. Does arabic languages uses ICU word boundaries? From Tokenizer:
The ICU tokenizer implementation is very simple. It splits the input
text according to the ICU rules for finding word boundaries and
discards any tokens that consist entirely of white-space. This may be
suitable for some applications in some locales, but not all. If more
complex processing is required, for example to implement stemming or
discard punctuation, this can be done by creating a tokenizer
implementation that uses the ICU tokenizer as part of its
implementation.

Search data from sqlite3 database in android

I have a Sqlite3 database in android, with data are sentences like: "good afternoon" or "have a nice day", now I want to have a search box, to search between them, I use something like this :
Cursor cursor = sqliteDB.rawQuery("SELECT id FROM category WHERE sentences LIKE '"+ s.toString().toLowerCase()+ "%' LIMIT 10", null);
But it only show "good afternoon" as result if user start searching with first "g" or "go" or "goo" or etc, how can I retrieve "good afternoon" as results, if user search like "a" or "af" or "afternoon".
I mean I want to show "good afternoon" result, if user search from middle of a data in sqlite3 db, not only if user searches from beginning.
thanks!
Just put the percent sign in front of your query string: LIKE '%afternoon%'. However, your approach has two flaws:
It is susceptible to SQL injection attacks because you just insert unfiltered user input into your SQL query string. Use the query parameter syntax instead by re-writing your query as follows:
SELECT id FROM category WHERE sentences LIKE ? LIMIT 10. Add the user input string as selection argument to your query method call
It will be dead slow the bigger your database grows because LIKE queries are not optimized for quick string matching and lookups.
In order to solve number 2 you should use SQLite's FTS3 extension which greatly speeds up any text-related searches. Instead of LIKE you would be using the MATCH operator that uses a different query syntax:
SELECT id FROM category WHERE sentences MATCH 'afternoon' LIMIT 10
As you can see the MATCH operator does not need percent signs. It just tries to find any occurrence of a word in the whole text that is being searched (in your case the sentences column). Read through the documentation of FTS3 I've linked to. The MATCH query syntax provides some more pretty handy and powerful options for finding text in your database table which are pretty similar to early search engine query syntax such as:
MATCH 'afternoon OR evening'
The only (minor) downside to the FTS3 extension is that it blows up the database file size by creating additional search index tables and meta-data. But I think it's well worth it for this use case.

Sqlite advanced search

I'm developing an android application for a store, which provides many functions. One of these is a function that allows the customer to search a product with some criteria (price,size,type... like in the picture ).
I guess I should work with SqliteDatabase , but I have no idea how I can make this multi-criteria search interface , so the user can query the database.
It's simple. after setup your database, you can use SQL queries and JOIN types with WHERE critaria and finally achive proper data, use Cursor class to iterate through results.
see:
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
http://www.w3schools.com/sql/sql_where.asp
http://www.w3schools.com/sql/sql_and_or.asp
http://www.w3schools.com/sql/sql_join_inner.asp
Before handling it, you should design your database properly.
Suppose,
List of tables -
1) product_type(_id<should be primary key to link another table>,
type_id<integer>, type_name<text>
2) product_price(_id<should be primary key to link another table>,
type_id<integer>, price<double>
3) product_type_size(_id<should be primary key to link another table>, type_id<integer>, size<text>
And create views as your requirement -
product_search(join your table properly)
And run your queries as requirement and match with the value of the view.
Go ahead--

unable to retrieve special characters from sqlite fts3

I am having some problems with special characters in my scenario.
I have a sqlite db created using fts3.
When I use SELECT col_1, col_2, offsets(table) FROM table WHERE table MATCH 'h*' LIMIT 50;
I am able to get words which start with h.
but when I am using
SELECT col_1, col_2, offsets(table) FROM table WHERE table MATCH '#*' LIMIT 50;
I am not getting strings which start with #.
Where am I going wrong? Any pointer regarding approach would be great.
I think the behavior you described happens because SQLite FTS3 uses tokenizer called "simple" by default. The character # gets discarded because is not an alphanumeric character and its UTF codepoint is not greater than 127. My interpretation of this is that FTS is not for searching special characters, it is for searching natural text.
The fix I suggest is not to use FTS for this kind of queries but to use LIKE operator. Or you could try to search for other tokenizers available or write your on in C.

Categories

Resources