Full text search example in Android - android

I'm having a hard time understanding how to use full text search (FTS) with Android. I've read the SQLite documentation on the FTS3 and FTS4 extensions. And I know it's possible to do on Android. However, I'm having a hard time finding any examples that I can comprehend.
The basic database model
A SQLite database table (named example_table) has 4 columns. However, there is only one column (named text_column) that needs to be indexed for a full text search. Every row of text_column contains text varying in length from 0 to 1000 words. The total number of rows is greater than 10,000.
How would you set up the table and/or the FTS virtual table?
How would you perform an FTS query on text_column?
Additional notes:
Because only one column needs to be indexed, only using an FTS table (and dropping example_table) would be inefficient for non-FTS queries.
For such a large table, storing duplicate entries of text_column in the FTS table would be undesirable. This post suggests using an external content table.
External content tables use FTS4, but FTS4 is not supported before Android API 11. An answer can assume an API >= 11, but commenting on options for supporting lower versions would be helpful.
Changing data in the original table does not automatically update the FTS table (and vice versa). Including triggers in your answer is not necessary for this basic example, but would be helpful nonetheless.

Most Basic Answer
I'm using the plain sql below so that everything is as clear and readable as possible. In your project you can use the Android convenience methods. The db object used below is an instance of SQLiteDatabase.
Create FTS Table
db.execSQL("CREATE VIRTUAL TABLE fts_table USING fts3 ( col_1, col_2, text_column )");
This could go in the onCreate() method of your extended SQLiteOpenHelper class.
Populate FTS Table
db.execSQL("INSERT INTO fts_table VALUES ('3', 'apple', 'Hello. How are you?')");
db.execSQL("INSERT INTO fts_table VALUES ('24', 'car', 'Fine. Thank you.')");
db.execSQL("INSERT INTO fts_table VALUES ('13', 'book', 'This is an example.')");
It would be better to use SQLiteDatabase#insert or prepared statements than execSQL.
Query FTS Table
String[] selectionArgs = { searchString };
Cursor cursor = db.rawQuery("SELECT * FROM fts_table WHERE fts_table MATCH ?", selectionArgs);
You could also use the SQLiteDatabase#query method. Note the MATCH keyword.
Fuller Answer
The virtual FTS table above has a problem with it. Every column is indexed, but this is a waste of space and resources if some columns don't need to be indexed. The only column that needs an FTS index is probably the text_column.
To solve this problem we will use a combination of a regular table and a virtual FTS table. The FTS table will contain the index but none of the actual data from the regular table. Instead it will have a link to the content of the regular table. This is called an external content table.
Create the Tables
db.execSQL("CREATE TABLE example_table (_id INTEGER PRIMARY KEY, col_1 INTEGER, col_2 TEXT, text_column TEXT)");
db.execSQL("CREATE VIRTUAL TABLE fts_example_table USING fts4 (content='example_table', text_column)");
Notice that we have to use FTS4 to do this rather than FTS3. FTS4 is not supported in Android before API version 11. You could either (1) only provide search functionality for API >= 11, or (2) use an FTS3 table (but this means the database will be larger because the full text column exists in both databases).
Populate the Tables
db.execSQL("INSERT INTO example_table (col_1, col_2, text_column) VALUES ('3', 'apple', 'Hello. How are you?')");
db.execSQL("INSERT INTO example_table (col_1, col_2, text_column) VALUES ('24', 'car', 'Fine. Thank you.')");
db.execSQL("INSERT INTO example_table (col_1, col_2, text_column) VALUES ('13', 'book', 'This is an example.')");
(Again, there are better ways in do inserts than with execSQL. I am just using it for its readability.)
If you tried to do an FTS query now on fts_example_table you would get no results. The reason is that changing one table does not automatically change the other table. You have to manually update the FTS table:
db.execSQL("INSERT INTO fts_example_table (docid, text_column) SELECT _id, text_column FROM example_table");
(The docid is like the rowid for a regular table.) You have to make sure to update the FTS table (so that it can update the index) every time you make a change (INSERT, DELETE, UPDATE) to the external content table. This can get cumbersome. If you are only making a prepopulated database, you can do
db.execSQL("INSERT INTO fts_example_table(fts_example_table) VALUES('rebuild')");
which will rebuild the whole table. This can be slow, though, so it is not something you want to do after every little change. You would do it after finishing all the inserts on the external content table. If you do need to keep the databases in sync automatically, you can use triggers. Go here and scroll down a little to find directions.
Query the Databases
String[] selectionArgs = { searchString };
Cursor cursor = db.rawQuery("SELECT * FROM fts_example_table WHERE fts_example_table MATCH ?", selectionArgs);
This is the same as before, except this time you only have access to text_column (and docid). What if you need to get data from other columns in the external content table? Since the docid of the FTS table matches the rowid (and in this case _id) of the external content table, you can use a join. (Thanks to this answer for help with that.)
String sql = "SELECT * FROM example_table WHERE _id IN " +
"(SELECT docid FROM fts_example_table WHERE fts_example_table MATCH ?)";
String[] selectionArgs = { searchString };
Cursor cursor = db.rawQuery(sql, selectionArgs);
Further Reading
Go through these documents carefully to see other ways of using FTS virtual tables:
SQLite FTS3 and FTS4 Extensions (SQLite docs)
Storing and Searching for Data (Android docs)
Additional Notes
Set operators (AND, OR, NOT) in SQLite FTS queries have Standard Query Syntax and Enhanced Query Syntax. Unfortunately, Android apparently does not support the Enhanced Query Syntax (see here, here, here, and here). That means mixing AND and OR becomes difficult (requiring the use of UNION or checking PRAGMA compile_options it seems). Very unfortunate. Please add a comment if there is an update in this area.

Don't forget when using content from to rebuild the fts table.
I do this with a trigger on update, insert, delete

Related

SQLite Check if Table is FTS4

I'm developing an Android app that uses a SQLite database with FTS4 tables.
In the app there's an option to import a database from the external memory. This database needs to be checked to confirm that it has all the correct tables and columns. I already have the code to do that however I don't know how to check if the tables are "normal" or FTS4. This will result in problems later on with queries with MATCH on them.
The only way I can think of to check if the tables are FTS4 is to do a random query with MATCH and if it gets an error it's because they are not.
Is there a better way to do this like with just a command?
Using MATCH on a plain table results in an error message only if the table has at least one row.
FTS table have a virtual column with the same name as the table name. So you could try a query like SELECT MyTable FROM MyTable.
You could check whether the shadow tables (MyTable_content, MyTable_segdir, etc.) exist.
You could check the CREATE TABLE statement in the system table: SELECT sql FROM sqlite_master WHERE type = 'table' AND name = 'MyTable';

How to index one column when use full text search in sqlite?

Fts3 in sqlite use virtual table, that mean it use memory(ram) to store data? I have a table and only want to index one column, but fts3 require index all table, do that make increase the store data? How to index one column?
In this case, "virtual" just means that such a table is not a 'normal' SQLite table but has a custom implementation.
The documentation says:
For each FTS virtual table in a database, three to five real (non-virtual) tables are created to store the underlying data. These real tables are called "shadow tables". The real tables are named "%_content", "%_segdir", "%_segments", "%_stat", and "%_docsize", where "%" is replaced by the name of the FTS virtual table.
An FTS table should be thought of as an index, not a table.
You should keep your original table, and put only the text column into an FTS table. (To avoid duplicate storage, you can use an external content table.)

How to query an external content FTS4 table but return additional columns from the original content table

I am creating an FTS4 external content table in SQLite like this:
CREATE TABLE t2(id INTEGER PRIMARY KEY, col_a, col_b, col_text);
CREATE VIRTUAL TABLE fts_table USING fts4(content="t2", col_text);
I'm using an external content table so that I don't need to store duplicate values of col_text in fts_table. I'm only indexing col_text because col_a and col_b don't need to be indexed.
However, when I do a query of fts_table like this
SELECT * FROM fts_table WHERE fts_table MATCH 'something';
I don't have access to col_a and col_b from the content table t2. How do return all these columns (col_a, col_b, col_text) from a single FTS query?
Update
I tried using the notindexed=column_name option as in
CREATE VIRTUAL TABLE fts_table USING fts4(content="t2", col_a, col_b, col_text, notindexed=col_a, notindexed=col_b);
This should work for some people, but I am using it in Android and the notindexed option isn't supported until SQLite 3.8, which Android doesn't support until Android version 5.x. And I need to support android 4.x. I am updating this question to include the Android tag.
FTS tables have an internal INTEGER PRIMARY KEY column called docid or rowid.
When inserting a row in the FTS table, set that column to the primary key of the row in the original table.
Then you can easily look up the corresponding row, either with a separate query, or with a join like this:
SELECT *
FROM t2
WHERE id IN (SELECT docid
FROM fts_table
WHERE col_text MATCH 'something')

Query FTS table MIN

I'm trying to get the lowest _id from my fts table with this query:
SELECT MIN(_id) FROM fts WHERE tbl_no=2 AND parent_id=6
The result I'm getting is 10. However the smallest _id is 9 and it fits the selection arguments.
If I instead use
SELECT _id FROM fts WHERE tbl_no=2 AND parent_id=6
and select the 1st row, I get the correct result: 9.
Does have something to do with the table being virtual (FTS)? I recently transfered from multiple tables to a single FTS and am experiencing this.
Am I guaranteed to get the results I want with the 2nd query, considering the table never updated and it's sorted by default.
Notes: I am running this on Android (tried rawQuery and query). I have the table in front of me and I know it's correct:
Is _id a numeric or a string?
With string comparison, '10' < '9'.
Try:
SELECT MIN(CAST(_id AS UNSIGNED)) FROM fts WHERE tbl_no=2 AND parent_id=6
To check. I would not use this in production however as it won't be able to use an index.
In FTS tables, all columns store string values, and the string '10' is lexicographically smaller than '9'.
Furthermore, MIN(SomeColumn) is not a full-text search query, and thus is not very efficient.
For a unique integer ID in FTS tables, you should use the internal docid column.

SQLite Fts select query

I am making a dictionary of over 20,000 words in it. So, to make it work faster when search data, i am using fts3 table to do it.
my select query:
Cursor c=db.rawQuery("Select * from data where Word MATCH '"+word+"*'", null);
Using this query, it will show all the word that contain 'word' , but what i want is to get only the word that contain the beginning of the searching word.
Mean that i want it work like this query:
Cursor c=db.rawQuery("Select * from data where Word like '"+word+"%'", null);
Ex: I have : apple, app, and, book, bad, cat, car.
when I type 'a': i want it to show only: apple, app, and
What can i solve with this?
table(_id primary key not null autoincrement, word text)
FTS table does not use the above attributes. It ignores data type. It does not auto increment columns other than the hidden rowid column. "_id" will not act as a primary key here. Please verify that you are implementing an FTS table
https://www.sqlite.org/fts3.html
a datatype name may be optionally specified for each column. This is
pure syntactic sugar, the supplied typenames are not used by FTS or
the SQLite core for any purpose. The same applies to any constraints
specified along with an FTS column name - they are parsed but not used
or recorded by the system in any way.
As for your original question, match "abc*" already searches from the beginning of the word. For instance match "man*" will not match "woman".
FTS supports searching for the beginning of a string with ^:
SELECT * FROM FtsTable WHERE Word MATCH '^word*'
However, the full-text search index is designed to find words inside larger texts.
If your Word column contains only a single word, your query is more efficient if you use LIKE 'a%' and rely on a normal index.
To allow an index to be used with LIKE, the table column must have TEXT affinity, and the index must be declared as COLLATE NOCASE (because LIKE is not case sensitive):
CREATE TABLE data (
...
Word TEXT,
...
);
CREATE INDEX data_Word_index ON data(Word COLLATE NOCASE);
If you were to use GLOB instead, the index would have to be case sensitive (the default).
You can use EXPLAIN QUERY PLAN to check whether the query uses the index:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM data WHERE Word LIKE 'a%';
0|0|0|SEARCH TABLE data USING INDEX data_Word_index (Word>? AND Word<?)

Categories

Resources