How to do a word search on a large text database - android

I have a large database in my app. One column is made of text strings that are about a sentence to a paragraph long. I would like to make this column searchable by word(s) that the user inputs.
How would I make a quick search? I've heard of making an index but I don't know how to do that for a text search.

SQLite has a mechanism for storing a lot of text in a database, it's called FTS (short for full text search).
Android supports all SQLite commands, so you can easily just use FTS3.
How is explained in the documentation linked above.
Example for creating a table:
CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT); /* FTS3 table */
CREATE TABLE enrondata2(content TEXT); /* Ordinary table */
Query:
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */

Related

Using FTS3/4 Table based on Select Statements from Normal Table

In my app, I'm developing in Android, I have a Sqlite Table called Transactions, with these fields:
_Id | Date | Value | Notes
I already have a ListView showing results filtering by Date (for example):
Select * FROM Transacions WHERE Date BETWEEN '2016-04-25' AND '2016-05-14'
It works fine, but I want to implement a SearchView to search transactions between a custom date have in field Notes, some text typed in SearchView.
I read about adding a SearchView, and the best way to implement to search is using a FTS3 or FTS4 Table, allowing the user, for example, type "SUPERMARKET" and find a Transaction where the Notes have this text.
The problem appears because a FTS table is slow to perform WHERE conditions (like the above, to filter date)...
How can I implement both Filtering date, using WHERE date BETWEEN ... and ..., and filtering text Notes with the performance of a FTS Table?
If it is not possible to do so, is it a good idea to have a query like this:
Select * FROM Transacions WHERE (Date BETWEEN '2016-04-25' AND '2016-05-14) AND Notes LIKE '%text%''?
Do not think of an FTS table as a table, but as an index.
With the notes indexed like this:
CREATE VIRTUAL TABLE Transactions_FTS USING FTS4(Notes);
you would have to ensure that the IDs of both tables match, and could then combine the tables like this:
SELECT *
FROM Transactions
WHERE Date BETWEEN '2016-04-25' AND '2016-05-14'
AND _Id IN (SELECT docid
FROM Transactions_FTS
WHERE Notes MATCH 'supermarket');
or this:
SELECT *
FROM Transactions
JOIN Transactions_FTS ON Transactions._Id = Transactions_FTS.docid
WHERE Date BETWEEN '2016-04-25' AND '2016-05-14'
AND Transactions_FTS.Notes MATCH 'supermarket';
(If you care about the amount of storage used, consider an external content FTS table.)

How to query in sqlite fts3 table using "-*"?

I have a fts3 table named tab and a lot of entries in it. When I run this query:
SELECT * FROM tab WHERE key MATCH 'an*';
I get the results like this:
an
anul
an-
But when I run this query:
SELECT * FROM tab WHERE key MATCH 'an-*';
it still results the "an" entry. The result would be like:
an
an-
How can I write my query so the result woudn't be "an", but only those entries that actually contains the character "-"?
According to the default tokenizer rules, - separates words, and is otherwise ignored.
You have to search for the word an first, and check for the hyphen afterwards:
SELECT *
FROM tab
WHERE key MATCH 'an'
AND key LIKE 'an-%';

SQLite Fts select query

I am making a dictionary of over 20,000 words in it. So, to make it work faster when search data, i am using fts3 table to do it.
my select query:
Cursor c=db.rawQuery("Select * from data where Word MATCH '"+word+"*'", null);
Using this query, it will show all the word that contain 'word' , but what i want is to get only the word that contain the beginning of the searching word.
Mean that i want it work like this query:
Cursor c=db.rawQuery("Select * from data where Word like '"+word+"%'", null);
Ex: I have : apple, app, and, book, bad, cat, car.
when I type 'a': i want it to show only: apple, app, and
What can i solve with this?
table(_id primary key not null autoincrement, word text)
FTS table does not use the above attributes. It ignores data type. It does not auto increment columns other than the hidden rowid column. "_id" will not act as a primary key here. Please verify that you are implementing an FTS table
https://www.sqlite.org/fts3.html
a datatype name may be optionally specified for each column. This is
pure syntactic sugar, the supplied typenames are not used by FTS or
the SQLite core for any purpose. The same applies to any constraints
specified along with an FTS column name - they are parsed but not used
or recorded by the system in any way.
As for your original question, match "abc*" already searches from the beginning of the word. For instance match "man*" will not match "woman".
FTS supports searching for the beginning of a string with ^:
SELECT * FROM FtsTable WHERE Word MATCH '^word*'
However, the full-text search index is designed to find words inside larger texts.
If your Word column contains only a single word, your query is more efficient if you use LIKE 'a%' and rely on a normal index.
To allow an index to be used with LIKE, the table column must have TEXT affinity, and the index must be declared as COLLATE NOCASE (because LIKE is not case sensitive):
CREATE TABLE data (
...
Word TEXT,
...
);
CREATE INDEX data_Word_index ON data(Word COLLATE NOCASE);
If you were to use GLOB instead, the index would have to be case sensitive (the default).
You can use EXPLAIN QUERY PLAN to check whether the query uses the index:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM data WHERE Word LIKE 'a%';
0|0|0|SEARCH TABLE data USING INDEX data_Word_index (Word>? AND Word<?)

Android: SQLite FTS3 slows down when fetching next/previous rows

I have a sqlite db that at the moment has few tables where the biggest one has over 10,000 rows. This table has four columns: id, term, definition, category. I have used a FTS3 module to speed up searching which helped a lot. However, now when I try to fetch 'next' or 'previous' row from table it takes longer than it was before I started using FTS3.
This is how I create virtual table:
CREATE VIRTUAL TABLE profanity USING fts3(_id integer primary key,name text,definition text,category text);
This is how I fetch next/previous rows:
SELECT * FROM dictionary WHERE _id < "+id + " ORDER BY _id DESC LIMIT 1
SELECT * FROM dictionary WHERE _id > "+id + " ORDER BY _id LIMIT 1
When I run these statements on the virtual table:
NEXT term is fetch within ~300ms,
PREVIOUS term is fetch within ~200ms
When I do it with normal table (the one created without FTS3):
NEXT term is fetch within ~3ms,
PREVIOUS term is fetch within ~2ms
Why there is such a big difference? Is there any way I can improve this speed?
EDITED:
I still can't get it to work!
Virtual table you've created is designed to provide full text queries. It's not aimed to fast processing standard queries using PK in where condition.
In this case there is no index on your _id column, so SQLite probably performs full table scan.
Next problem is your query - it's totally inefficient. Try something like this (untested):
SELECT * FROM dictionary WHERE _id = (select max(_id) from dictionary where _id < ?)
Next thing you can consider is redesign of your app. Instead of loading 1 row you, maybe you should get let's say 40, load them into memory and make background data loading when there is less than n to one of the ends. Long SQL operation will become invisible to user even if it'll last 3s instead of 0,3s
If you're running LIMIT 1 to begin with, you can remove the order by clause completely. This may help. I'm not familiar with FTS3, however.
You could also just flat out assign your id variable a ++ or -- and assert `WHERE _id = "+id+" LIMIT 1" which would make a single lookup instead of < or >.
Edit: and now that I look back at what I typed, if you do it that way, you can just remove LIMIT 1 completely, since your _id is your pk and must be unique.
hey look, a raw where clause!

Best way to search sqlite database

In my application ,am work with a large database.Nearly 75000 records present in a table(totally 6 tables are there).i want to get a data from three different table at a time.i completed that.but the search process was slow.how can i optimise the searching process?
You might want to consider using the full-text search engine and issuing SELECT...MATCH queries instead. Note that you need to enable the FTS engine (it's disabled by default) and create virtual tables instead of regular tables. You can read more about it here.
Without being able to see the table structure (or query) the first thing I'd suggest is adding some indexes to the tables.
Lets say you have a few tables like:
Author
id
last_name
first_name
Subject
id
name
Book
id
title
author_id
subject_id
and you're wanting to get all the information about each of the books that an author with last_name="Smith" and first_name="John" wrote. Your query might look something like this:
SELECT * FROM Book b
LEFT JOIN Subject s
ON s.id=b.subject_id
LEFT JOIN Author a
ON a.id=b.author_id
WHERE a.last_name='Smith'
AND a.first_name='John';
There you'd want the last_name column in the Author table to have an index (and maybe first_name too).

Categories

Resources