Android sqlite fts - using Offsets function with exact phrase search - android

I have a book reader type app and I am using sqlite to store texts and provide search function which highlights search results returned by. My problem is that when for example I have the following text:
"Assuming N is a positive value, if no fragments can be found that
contain a phrase match corresponding to each matchable phrase, the
snippet function attempts to find two fragments of approximately N/2
tokens that between them contain at least one phrase match for each
matchable phrase matched by the current row. "
and I am searching for exact phrase "snippet function attempts", then I expect to get 1 search result, but I get 3 -> first is 'snippet', second is 'function' and third is 'attempts'.
My sqlite query is following:
'SELECT col1,col2, col3, offsets(index_table) FROM index_table WHERE col3 MATCH "snippet function attempts" '
How can I tell offsets() function to return offset for the whole phrase I am searching rather then individual parts of the phrase?

Try using :-
"SELECT col1,col2, col3, offsets(index_table) FROM index_table WHERE col3 MATCH 'snippet function attempts' "
i.e. single quotes around the phrase, as opposed to double quotes.
Enclosing the phrase in double quotes tells FTS that the values are a
list of phrases as per Phrase queries. A phrase query is a query that
retrieves all documents that contain a nominated set of terms or term
prefixes in a specified order with no intervening tokens. Phrase
queries are specified by enclosing a space separated sequence of terms
or term prefixes in double quotes ("). SQLite FTS3 and FTS4 Extensions

Related

Error occurred while store image path to SQLite Database using Android?

It continues to occur error... what is the problem?
every variables are String.
String insertSQL = "INSERT INTO " + DBHelper.getTableName() + " VALUES (\''"+entry.getKey()+"\'', \''"+images+"\'')";
Error message
INSERT INTO LABELING RESULT VALUES (''Sky'',
''["/storage/emulated/0/DCIM/CandyCam/IMG_20171009_164101723.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180305_000218777.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180401_235850170.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180518_194252232.jpg"]''))
My table has three column : ID(Integer), LABEL(TEXT), IMAGES(TEXT)
Issue 1 you have specified a table name with a space i.e. LABELING RESULT, if this is the table name then you would need to enclose it e.g. [LABELING RESULT].
Issue 2 you have double single quotes when you should only have 1.
Issue 3 you have an extra closing parenthesis.
I believe the following is what you want :-
INSERT INTO [LABELING RESULT] VALUES ('Sky', '["/storage/emulated/0/DCIM/CandyCam/IMG_20171009_164101723.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180305_000218777.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180401_235850170.jpg","/storage/emulated/0/DCIM/Pictail/IMG_20180518_194252232.jpg"]'))
Which would (not checked though) be :-
String insertSQL = "INSERT INTO [" + DBHelper.getTableName() + "] VALUES ('"+entry.getKey()+"', '"+images+"')";
That assumes that Sky goes into the LABEL column and the 2 comma separated image paths go into the IMAGES column.
Additional re enclosing (the [ ] ):-
If you want to use a keyword as a name, you need to quote it. There
are four ways of quoting keywords in SQLite:
'keyword' A keyword in single quotes is a string literal.
"keyword" A keyword in double-quotes is an identifier.
[keyword] A keyword enclosed in square brackets is an identifier. This is not
standard SQL. This quoting mechanism is used by MS Access and SQL
Server and is included in SQLite for compatibility.
keyword A keyword enclosed in grave accents (ASCII code 96) is an identifier.
This is not standard SQL. This quoting mechanism is used by MySQL and
is included in SQLite for compatibility.
SQL As Understood By SQLite - SQLite Keywords

SQLite Fts select query

I am making a dictionary of over 20,000 words in it. So, to make it work faster when search data, i am using fts3 table to do it.
my select query:
Cursor c=db.rawQuery("Select * from data where Word MATCH '"+word+"*'", null);
Using this query, it will show all the word that contain 'word' , but what i want is to get only the word that contain the beginning of the searching word.
Mean that i want it work like this query:
Cursor c=db.rawQuery("Select * from data where Word like '"+word+"%'", null);
Ex: I have : apple, app, and, book, bad, cat, car.
when I type 'a': i want it to show only: apple, app, and
What can i solve with this?
table(_id primary key not null autoincrement, word text)
FTS table does not use the above attributes. It ignores data type. It does not auto increment columns other than the hidden rowid column. "_id" will not act as a primary key here. Please verify that you are implementing an FTS table
https://www.sqlite.org/fts3.html
a datatype name may be optionally specified for each column. This is
pure syntactic sugar, the supplied typenames are not used by FTS or
the SQLite core for any purpose. The same applies to any constraints
specified along with an FTS column name - they are parsed but not used
or recorded by the system in any way.
As for your original question, match "abc*" already searches from the beginning of the word. For instance match "man*" will not match "woman".
FTS supports searching for the beginning of a string with ^:
SELECT * FROM FtsTable WHERE Word MATCH '^word*'
However, the full-text search index is designed to find words inside larger texts.
If your Word column contains only a single word, your query is more efficient if you use LIKE 'a%' and rely on a normal index.
To allow an index to be used with LIKE, the table column must have TEXT affinity, and the index must be declared as COLLATE NOCASE (because LIKE is not case sensitive):
CREATE TABLE data (
...
Word TEXT,
...
);
CREATE INDEX data_Word_index ON data(Word COLLATE NOCASE);
If you were to use GLOB instead, the index would have to be case sensitive (the default).
You can use EXPLAIN QUERY PLAN to check whether the query uses the index:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM data WHERE Word LIKE 'a%';
0|0|0|SEARCH TABLE data USING INDEX data_Word_index (Word>? AND Word<?)

Search data from sqlite3 database in android

I have a Sqlite3 database in android, with data are sentences like: "good afternoon" or "have a nice day", now I want to have a search box, to search between them, I use something like this :
Cursor cursor = sqliteDB.rawQuery("SELECT id FROM category WHERE sentences LIKE '"+ s.toString().toLowerCase()+ "%' LIMIT 10", null);
But it only show "good afternoon" as result if user start searching with first "g" or "go" or "goo" or etc, how can I retrieve "good afternoon" as results, if user search like "a" or "af" or "afternoon".
I mean I want to show "good afternoon" result, if user search from middle of a data in sqlite3 db, not only if user searches from beginning.
thanks!
Just put the percent sign in front of your query string: LIKE '%afternoon%'. However, your approach has two flaws:
It is susceptible to SQL injection attacks because you just insert unfiltered user input into your SQL query string. Use the query parameter syntax instead by re-writing your query as follows:
SELECT id FROM category WHERE sentences LIKE ? LIMIT 10. Add the user input string as selection argument to your query method call
It will be dead slow the bigger your database grows because LIKE queries are not optimized for quick string matching and lookups.
In order to solve number 2 you should use SQLite's FTS3 extension which greatly speeds up any text-related searches. Instead of LIKE you would be using the MATCH operator that uses a different query syntax:
SELECT id FROM category WHERE sentences MATCH 'afternoon' LIMIT 10
As you can see the MATCH operator does not need percent signs. It just tries to find any occurrence of a word in the whole text that is being searched (in your case the sentences column). Read through the documentation of FTS3 I've linked to. The MATCH query syntax provides some more pretty handy and powerful options for finding text in your database table which are pretty similar to early search engine query syntax such as:
MATCH 'afternoon OR evening'
The only (minor) downside to the FTS3 extension is that it blows up the database file size by creating additional search index tables and meta-data. But I think it's well worth it for this use case.

FTS3 and FTS4 matching of :, -, and _ characters

I'm seeing some weird behaviour on my FTS enabled SQLite database. I have a table named fingerprints that contains a column named scan. Entries of scan are long strings that look like this:
00:13:10:d5:69:88_-58;0c:85:25:68:b4:30_-75;0c:85:25:68:b4:34_-76;0c:85:25:68:b4:33_-76;0c:85:25:68:b4:31_-76;0c:85:25:68:b4:35_-76;00:23:eb:ad:f6:00_-87; etc
It represent MAC addresses and signal strengths. Now I want to do string matching on the table and try to match for instance a MAC address:
SELECT _id FROM fingerprints WHERE scan MATCH "00:13:10:d5:69:88";
This returns a lot of rows that do not have the specified string in it for some reason. Second thing I will try to match is
SELECT _id FROM fingerprints WHERE scan MATCH "00:13:10:d5:69:88_-58";
This returns the same rows has before and is completely wrong.
Does SQLite treats the : _ - characters in any special way?
Thanks
What you're seeing is the effect of the FTS tokenizing your data.
The full text search doesn't work on un-processed long strings, it splits your data (and your search terms) into words and indexes them individually. The default tokenizer uses all alphanumeric characters and all characters with a code point >128 for words, and uses the rest of the characters (for example, as you're seeing : _ -) as word boundaries.
In other words, your search for 00:13:10:d5:69:88 will search for rows containing the words 00 and 13 and 10 and d5 and 69 and 88 in any order.
You can verify this behavior;
sqlite> CREATE VIRTUAL TABLE simple USING fts3(tokenize=simple);
sqlite> INSERT INTO simple VALUES('00:13:10:d5:69:88');
sqlite> SELECT * FROM simple WHERE simple MATCH '69:10';
-> 00:13:10:d5:69:88
EDIT: Apparently SQLite is smarter than I originally gave it credit for, you can use phrase queries (scroll down about a page from the link destination) to look for word sequences, which would solve your problem. Phrase queries are specified by enclosing a space (or other word separator) separated sequence of terms in double quotes (").
sqlite> SELECT * FROM simple WHERE simple MATCH '"69:10"';
-> No match
sqlite> SELECT * FROM simple WHERE simple MATCH '"69 88"';
-> 00:13:10:d5:69:88
sqlite> SELECT * FROM simple WHERE simple MATCH '"69:88"';
-> 00:13:10:d5:69:88

SQLite: Efficient substring search in large table

I'm developing an Android application that has to perform substring search in a large table (about 500'000 entries with street and location names, so just a few words per entry).
CREATE TABLE Elements (elementID INTEGER, type INTEGER, name TEXT, data BLOB)
Note that only 20% of all entries contain strings in the "name" column.
Performing the following query almost takes 2 minutes:
SELECT elementID, name FROM Elements WHERE name LIKE %foo%
I now tried to use FTS3 in order to speed up the query. That was quite successful, query time decreased to 1 minute (surprisingly the database file size increased by only 5%, which is also quite good for my purpose).
The problem is, FTS3 seemingly doesn't support substring search, i.e. if I want to find "bar" in "foo bar" and "foobar", I only get "foo bar", although I need both results.
So actually I have two questions:
Is it possible to further speed up the query? My goal is 30 seconds for the query, but I don't know if that's realistic...
How can I get real substring search using FTS3?
Solution 1:
If you can make every character in your database as an individual word, you can use phrase queries to search the substring.
For example, assume "my_table" contains a single column "person":
person
------
John Doe
Jane Doe
you can change it to
person
------
J o h n D o e
J a n e D o e
To search the substring "ohn", use phrase query:
SELECT * FROM my_table WHERE person MATCH '"o h n"'
Beware that "JohnD" will match "John Doe", which may not be desired.
To fix it, change the space character in the original string into something else.
For example, you can replace the space character with "$":
person
------
J o h n $ D o e
J a n e $ D o e
Solution 2:
Following the idea of solution 1, you can make every character as an individual word with a custom tokenizer and use phrase queries to query substrings.
The advantage over solution 1 is that you don't have to add spaces in your data, which can unnecessarily increase the size of database.
The disadvantage is that you have to implement the custom tokenizer. Fortunately, I have one ready for you. The code is in C, so you have to figure out how to integrate it with your Java code.
You should add an index to the name column on your database, that should speed up the query considerably.
I believe SQLite3 supports sub-string matching like so:
SELECT * FROM Elements WHERE name MATCH '*foo*';
http://www.sqlite.org/fts3.html#section_3
I am facing some thing similar to your problem. Here is my suggestion try creating a translation table that will translate all the words to numbers. Then search numbers instead of words.
Please let me know if this is helping.
not sure about speeding it up since you're using sqllite, but for substring searches, I have done things like
SET #foo_bar = 'foo bar'
SELECT * FROM table WHERE name LIKE '%' + REPLACE(#foo_bar, ' ', '%') + '%'
of course this only returns records that have the word "foo" before the word "bar".

Categories

Resources