FTS3 and FTS4 matching of :, -, and _ characters

FTS3 and FTS4 matching of :, -, and _ characters - android

I'm seeing some weird behaviour on my FTS enabled SQLite database. I have a table named fingerprints that contains a column named scan. Entries of scan are long strings that look like this:
00:13:10:d5:69:88_-58;0c:85:25:68:b4:30_-75;0c:85:25:68:b4:34_-76;0c:85:25:68:b4:33_-76;0c:85:25:68:b4:31_-76;0c:85:25:68:b4:35_-76;00:23:eb:ad:f6:00_-87; etc
It represent MAC addresses and signal strengths. Now I want to do string matching on the table and try to match for instance a MAC address:
SELECT _id FROM fingerprints WHERE scan MATCH "00:13:10:d5:69:88";
This returns a lot of rows that do not have the specified string in it for some reason. Second thing I will try to match is
SELECT _id FROM fingerprints WHERE scan MATCH "00:13:10:d5:69:88_-58";
This returns the same rows has before and is completely wrong.
Does SQLite treats the : _ - characters in any special way?
Thanks

What you're seeing is the effect of the FTS tokenizing your data.
The full text search doesn't work on un-processed long strings, it splits your data (and your search terms) into words and indexes them individually. The default tokenizer uses all alphanumeric characters and all characters with a code point >128 for words, and uses the rest of the characters (for example, as you're seeing : _ -) as word boundaries.
In other words, your search for 00:13:10:d5:69:88 will search for rows containing the words 00 and 13 and 10 and d5 and 69 and 88 in any order.
You can verify this behavior;
sqlite> CREATE VIRTUAL TABLE simple USING fts3(tokenize=simple);
sqlite> INSERT INTO simple VALUES('00:13:10:d5:69:88');
sqlite> SELECT * FROM simple WHERE simple MATCH '69:10';
-> 00:13:10:d5:69:88
EDIT: Apparently SQLite is smarter than I originally gave it credit for, you can use phrase queries (scroll down about a page from the link destination) to look for word sequences, which would solve your problem. Phrase queries are specified by enclosing a space (or other word separator) separated sequence of terms in double quotes (").
sqlite> SELECT * FROM simple WHERE simple MATCH '"69:10"';
-> No match
sqlite> SELECT * FROM simple WHERE simple MATCH '"69 88"';
-> 00:13:10:d5:69:88
sqlite> SELECT * FROM simple WHERE simple MATCH '"69:88"';
-> 00:13:10:d5:69:88

Related

SQLite Fts select query

I am making a dictionary of over 20,000 words in it. So, to make it work faster when search data, i am using fts3 table to do it.
my select query:
Cursor c=db.rawQuery("Select * from data where Word MATCH '"+word+"*'", null);
Using this query, it will show all the word that contain 'word' , but what i want is to get only the word that contain the beginning of the searching word.
Mean that i want it work like this query:
Cursor c=db.rawQuery("Select * from data where Word like '"+word+"%'", null);
Ex: I have : apple, app, and, book, bad, cat, car.
when I type 'a': i want it to show only: apple, app, and
What can i solve with this?

table(_id primary key not null autoincrement, word text)
FTS table does not use the above attributes. It ignores data type. It does not auto increment columns other than the hidden rowid column. "_id" will not act as a primary key here. Please verify that you are implementing an FTS table
https://www.sqlite.org/fts3.html
a datatype name may be optionally specified for each column. This is
pure syntactic sugar, the supplied typenames are not used by FTS or
the SQLite core for any purpose. The same applies to any constraints
specified along with an FTS column name - they are parsed but not used
or recorded by the system in any way.
As for your original question, match "abc*" already searches from the beginning of the word. For instance match "man*" will not match "woman".

FTS supports searching for the beginning of a string with ^:
SELECT * FROM FtsTable WHERE Word MATCH '^word*'
However, the full-text search index is designed to find words inside larger texts.
If your Word column contains only a single word, your query is more efficient if you use LIKE 'a%' and rely on a normal index.
To allow an index to be used with LIKE, the table column must have TEXT affinity, and the index must be declared as COLLATE NOCASE (because LIKE is not case sensitive):
CREATE TABLE data (
...
Word TEXT,
...
);
CREATE INDEX data_Word_index ON data(Word COLLATE NOCASE);
If you were to use GLOB instead, the index would have to be case sensitive (the default).
You can use EXPLAIN QUERY PLAN to check whether the query uses the index:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM data WHERE Word LIKE 'a%';
0|0|0|SEARCH TABLE data USING INDEX data_Word_index (Word>? AND Word<?)

unable to retrieve special characters from sqlite fts3

I am having some problems with special characters in my scenario.
I have a sqlite db created using fts3.
When I use SELECT col_1, col_2, offsets(table) FROM table WHERE table MATCH 'h*' LIMIT 50;
I am able to get words which start with h.
but when I am using
SELECT col_1, col_2, offsets(table) FROM table WHERE table MATCH '#*' LIMIT 50;
I am not getting strings which start with #.
Where am I going wrong? Any pointer regarding approach would be great.

I think the behavior you described happens because SQLite FTS3 uses tokenizer called "simple" by default. The character # gets discarded because is not an alphanumeric character and its UTF codepoint is not greater than 127. My interpretation of this is that FTS is not for searching special characters, it is for searching natural text.
The fix I suggest is not to use FTS for this kind of queries but to use LIKE operator. Or you could try to search for other tokenizers available or write your on in C.

select items with certain "/" in SQL

I did a bit of research about sql escape characters and count statements and didnt find a solution to my question. Even though I used stuff like:
SELECT * FROM table WHERE path LIKE '%/_%' ESCAPE '/';
I got a table where in a column there is paths so I want to select the items where I have certain number of slashes:
ID DIRECTORY
1 root/A
2 root/B
3 root/A/1/2
4 root/B/1/2
5 root/A/1
6 root/B/2
so, how do I select for example the elements that have only 2 slashes??
Edit 1: This is to be done in Android SQL-Lite Database

You can use a regular expression:
SELECT * FROM table WHERE path REGEXP '^([^/]*)/([^/]+)/([^/]*)$';
The above expression looks specifically for an optional group of characters not containing /, followed by /, followed by another group without /, followed by /, and optionally another set of characters before the end of the string.
So:
/Bxx92/2 -- match
5 root/A/1 -- match
6 root/Bxx92/2 -- match
6 root/Bxx92/2 -- match
7 root/Bxx92/ -- match
6 root/2 -- NO match
If there MUST be something before the first and after the last /, change the expression to '^([^/]+)/([^/]+)/([^/]+)$'

You can use this trick to count occurrences of a character in a string:
SELECT LENGTH('path') - LENGTH(REPLACE('path', '/', '')) AS `occurrences`
So you can achieve the goal with
SELECT id, path FROM
(SELECT id, path, LENGTH('path') - LENGTH(REPLACE('path', '/', '')) AS `occurrences`
FROM table) temp
WHERE occurrences = 2
However, I expect performance will be terrible. If you are going to query like that, consider adding a column with the path depth so that you can query directly with
SELECT id, path FROM table WHERE depth = 2

How to query similar records in SQLite database from Android?

Let's say an SQLite database column contains the following value:
U Walther-Schreiber-Platz
Using a query I'd like to find this record, or any similar records, if the user searches for the following strings:
walther schreiber
u walther
walther-schreiber platz
[Any other similar strings]
However I cannot figure out any query which would do that. Especially if the user does not enter the - character.
Example:
select * from myTable where value like '%walther schreiber%'; would not return the result because of the missing -.
Thanks,
Robert

So, as I said in my comment, I think you can have a query along the lines of:
select * from myTable where LOWER(value) like <SearchValue>
I'm assuming you're collecting the <SearchValue> from the user programmatically, so would be able to do the following: <SearchValue> would need to be: The user's search string, appropriately cleansed to avoid SQL injection attacks, converted to lower case, with all of the spaces converted to '%', so that they match zero or more characters...
So you would have (for example):
select * from myTable where LOWER(value) like '%walther%schreiber%'
select * from myTable where LOWER(value) like '%walther-schreiber%platz%'
etc... however, this does assume that the word order is the same, which in your examples it is...

Query to sort a field such that strings comes first and then numbers

I've a column in which contains numbers or strings. The type of the column is varchar.
Usually when we sort it using the string field, then all the numbers come first and then strings start. But I want all the strings first and then numbers.
TIA !

You'll have to write it in two separate queries. One for selecting numbers, the other for strings. Preferably I would create a second column (one for numbers, one for strings), making it easier and faster to have those two queries run.

This worked for me...
Select * from Table order by stringfield+0;
edit: http://www.sqlite.org/datatypes.html (Point 4.0)
UPDATE: Try this....
select * from Table where LENGTH(trim(stringfield,"0123456789 ") )=0 union select * from table order by stringfield;

How about the following (two queries as suggested above):
select * from Table where LENGTH(trim(stringfield,"0123456789 ")) > 0; select * from table where LENGTH(trim(stringfield,"0123456789 ")) = 0;
The first select should return only values that are not numeric, whilst the second should return only values that are numeric.
For a table that contains a mixture of numeric and string data, this outputs the strings first, then the numbers.

Have you considered creating a custom collation-function? I have never used this myself, but it sounds like exactly what you need.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

FTS3 and FTS4 matching of :, -, and _ characters - android

Related

SQLite Fts select query

unable to retrieve special characters from sqlite fts3

select items with certain "/" in SQL

How to query similar records in SQLite database from Android?

Query to sort a field such that strings comes first and then numbers

Categories

Resources