Sqlite Query to MATCH Multiple Words in FTS4 Table Android - android

I am trying to do this query in android FTS4 table and this works perfectly:
SELECT * from table WHERE table MATCH 'description: paint* OR alias: paint*'
I need to match multiple words in multiple columns like this:
SELECT * from table WHERE table MATCH 'description: seal* AND paint* OR alias: seal* OR paint*'
This doesn't work in android but works in any DB browser.
I have tried many combinations such as below, they all work in the browser but not in android.
SELECT * from table WHERE table MATCH '(description: seal* AND paint*) OR (alias: seal* OR paint*)'
SELECT * from table WHERE table MATCH 'description: (seal* AND paint*) OR alias: (seal* OR paint*)'
The documentation of sqlite3 doesn't specify any solution for multiple columns with multiple words.
Also in this question the query works in the current android environment as mentioned above in my first line of code. Maybe it didn't work in the past but now it works. My problem is regarding multiple values with OR/AND as described not multiple columns.
Is there any way to achieve this thing in Android?

Related

Custom tokenchars for apostrophes in SQLite / FTS3 / FTS4 / Android?

My Android app contains a SQLite table (Full Text Search / FTS3) which contains data such as:
Green's Product
In the app, when a user searches for Green's Product, the match is found. But when the user searches for Greens Product (i.e., without the apostrophe), there is no match.
I need the product to be returned regardless of whether the user types the apostrophe or not.
So I have being reading the documentation about FTS3 tokenizers which suggests that I need to specify the apostrophe in the tokencharsargument.
So I have tried re-creating my table like this:
CREATE VIRTUAL TABLE products USING fts3 (product_name TEXT, tokenize=simple "tokenchars='");
but there is still no match when a user searches for Greens Product.
I've also tried:
CREATE VIRTUAL TABLE products USING fts3 (product_name TEXT, tokenize=simple "tokenchars=''"); (double apostrophe)
and
CREATE VIRTUAL TABLE products USING fts4 (product_name TEXT, tokenize=simple "tokenchars='"); (fts4)
and
CREATE VIRTUAL TABLE products USING fts4 (product_name TEXT, tokenize=simple "tokenchars=''"); (double apostrophe and fts4)
but none of them seem to be working for me.
I'm not sure what to try next. And I'm also not sure if I may be coming up against some limitation here due to Android's implementation of SQLite.
Does anyone have any suggestions?
If you add the apostrophe to tokenchars, then it will become part of the word. This means that you must specifiy it to find it.
You actually want to ignore apostrophes, without breaking tokens. There appears to be no built-in option for this; you could either write a custom tokenizer, or just remove such characters before inserting/searching strings.

SQLite Check if Table is FTS4

I'm developing an Android app that uses a SQLite database with FTS4 tables.
In the app there's an option to import a database from the external memory. This database needs to be checked to confirm that it has all the correct tables and columns. I already have the code to do that however I don't know how to check if the tables are "normal" or FTS4. This will result in problems later on with queries with MATCH on them.
The only way I can think of to check if the tables are FTS4 is to do a random query with MATCH and if it gets an error it's because they are not.
Is there a better way to do this like with just a command?
Using MATCH on a plain table results in an error message only if the table has at least one row.
FTS table have a virtual column with the same name as the table name. So you could try a query like SELECT MyTable FROM MyTable.
You could check whether the shadow tables (MyTable_content, MyTable_segdir, etc.) exist.
You could check the CREATE TABLE statement in the system table: SELECT sql FROM sqlite_master WHERE type = 'table' AND name = 'MyTable';

Combine MATCH with OR clause in the WHERE statement

I want to perform a query in which the WHERE clausule has the following condition:
one MATCH condition over a column in a FTS3 table
OR
another not MATCH condition over a column in a non FTS table.
Example:
Say that I have two tables
books_fts, which is a table with a content column for full text search.
books_tags, which is non FTS table with tags.
I want to search all the books that either contain 'Dikjstra' in their content or are tagged with the 'algorithm' word. So I run this query:
SELECT * from books_fts
LEFT OUTER JOIN books_tags ON books_fts.fk_id = books_tags.id
WHERE (books_fts MATCH 'content:%Dijkstra*')
OR (books_tags.tag = 'algorithm')
I think the query is right, and if I run it with either one of the OR clausules, it works.
However, when running it with the two clausules I get the following error:
unable to use function MATCH in the requested context
Seems to me that I cannot combine a MATCH with a non MATCH in the WHERE clause, even if each of them apply to different tables (one FTS and another non FTS).
Is this true? I cannot find information on it.
NOTE: if the causules are separated with AND instead of OR the query is valid.
Thanks.
It seems it's a known issue in SQL:
http://sqlite.1065341.n5.nabble.com/FTS3-bug-with-MATCH-plus-OR-td50714.html

New to Android - Is _id a must for databases?

I have an already functioning app running on iOS whose database uses a composite primary key. For discussions sake, lets say "CID" and "RID" make up that composite pk, resulting in something that looks like:
CID-RID
F6uuDTEU1c-1
F6uuDTEU1c-2
F6uuDTEU1c-3
However, there are conditions under which the CID column is altered, resetting the RID column. For example:
CID-RID
...
F6uuDTEU1c-4
F6uuDTEU1c-5
WQq6JnyrDI-1
WQq6JnyrDI-2
WQq6JnyrDI-3
...etc
These databases are to be shared cross-platform (ios - android) and going back and editing the current ios structure is not an option. What issues am I going to run into not having an _id column as my pk running on Android?
I found this here on SO - which seems to state that the db itself does not have to have the _id column, only that ...
"The result set for the cursor must contain _id, not the cursor itself."
... but I could be reading this all wrong. Any input/help is much appreciated.
PS: I already looked at a few (what I thought were) similar questions here, here, and here.
You are free to have any database schema you want. Android doesn't impose any additional restrictrions there.
Only if you use a CursorAdapter, then the Cursor needs an _id column. Any app can be written without using CursorAdapter, it's just there to provide some convenience. sqlite tables always have a ROWID column that aliases to the INTEGER PRIMARY KEY column if the table has one. You can always select it as the _id, e.g. SELECT rowid AS _id ... if needed.

What is the advantage of FTS over custom solution?

I have a biggish database ~32mb which has lots of text in 4 languages. Including Arabic and Urdu. I need to search this text in the most efficient way (speed & size).
I am considering FTS, and trying to find out how to implement it. Right now I am reading http://www.sqlite.org/fts3.html#section_1_2 about it.
It seems to me, an FTS table is just like a normal table used to index all the different words. So my questions are:
1) If to populate FTS I have to do all the inserts myself, then why not make my own indexed word table, what is the difference?
Answer : Yes there are many advantages, many built in functions that help. For example with ranking etc, searching of stems and the transparent nature of how it all works in android makes the FTS approach more appealing.
2) On the google docs I read its a virtual in memory table, now this would be massive right... but it doesnt mention this on the SQLite website. So which is it?
3) Is there an easy way to generate all the different words from my columns?
4) Will the FTS handle arabic words properly?
FTS allows for fast searching of words; normal indexes only allow to search for entire values or for the beginning of the value.
If you table has only one word in each field, using FTS does not make sense.
FTS is a virtual table, but not an in-memory table.
You can get individual terms from the full-text index with the fts4aux table.
The default tokenizer works only with ASCII text.
You have to test whether the ICU or UNICODE61 tokenizers work with your data.
1) If to populate FTS I have to do all the inserts myself, then why
not make my own indexed word table, what is the difference?
Using your own indexed word table, you would have parse words in sentences. You would then need a table for sentences and another to words. And you should do this efficiently.
2) On the google docs I read its a virtual in memory table, now this
would be massive right... but it doesnt mention this on the SQLite
website. So which is it?
Don't understand your question. Data is handled via virtual table extension, however back storage is done in database (FTS4 creates 5 tables for each virtual table). Check this:
sqlite> CREATE VIRTUAL TABLE docs USING fts4();
sqlite> .schema
CREATE VIRTUAL TABLE docs USING fts4();
CREATE TABLE 'docs_content'(docid INTEGER PRIMARY KEY, 'content');
CREATE TABLE 'docs_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE 'docs_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_
end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE TABLE 'docs_docsize'(docid INTEGER PRIMARY KEY, size BLOB);
CREATE TABLE 'docs_stat'(id INTEGER PRIMARY KEY, value BLOB);
sqlite>
3) Is there an easy way to generate all the different words from my
columns?
For sure. But that's not easy. That's what FTS does.
4) Will the FTS handle arabic words properly?
I'm not sure. Does arabic languages uses ICU word boundaries? From Tokenizer:
The ICU tokenizer implementation is very simple. It splits the input
text according to the ICU rules for finding word boundaries and
discards any tokens that consist entirely of white-space. This may be
suitable for some applications in some locales, but not all. If more
complex processing is required, for example to implement stemming or
discard punctuation, this can be done by creating a tokenizer
implementation that uses the ICU tokenizer as part of its
implementation.

Categories

Resources