I am developing dictionary application. It requires incremental search which means that SELECTING should be fast. There are 200000+ rows. Let me, first of all explain, table structure. I have this table:
CREATE TABLE meaning(
key TEXT,
value TEXT,
entries BLOB);
Some times ago I had this index:
CREATE INDEX index_key ON meaning (key)
This query was performed for around ~500ms which was very slow
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
Then I dropped this index, created incasesensitive index which helped to improve performance 2-3 times.
CREATE INDEX index_key ON meaning (key COLLATE NOCASE);
Now this query performing for 75ms(min) - 275(ms) which is quite slow for incremental search.
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
I have tried to optimize query according to this post.
SELECT value FROM meaning WHERE key >= 'boy' AND key<'boz' LIMIT 100
But this query is performed for 451ms.
EXPLAIN
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
This is returning following values:
EXPLAIN QUERY PLAN
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
This is returning this value(detail column):
SEARCH TABLE meaning USING INDEX index_key (key>? AND key<?) (~31250 rows)
Actually this values did not give me some sense or key what to optimize.
Is it possible to optimize SELECTion of words to be performed in ~10ms by optimization of this query or creating another table or changing some parameters of SQLite database? Could you suggest me the best way to do this?
PS. Please, do not suggest to use FTS table. In previous version of application I have used FTS. I agree that it is extremely fast. I left FTS table idea for 2 reasons:
It is not giving proper result(it contains the words which user do not need)
It takes more disk space
Related
I have an SQLite DB where I perform a query like
Select * from table where col_name NOT IN ('val1','val2')
Basically I'm getting a huge list of values from server and I need to select the ones which is not present in the list given.
Currently its working fine, No issues. But the number of values from server becomes huge as the server DB is getting updated frequently.
So, I may get thousands of String values which I need to pass to the NOT IN
My question is, Will it cause any perfomance issue in the future? Does the NOT IN parameter have any size restriction? (like max 10000 values you can check)?
Will it cause any crash at some point?
This is an official reference about various limitation in sqlite. I think the Maximum Length Of An SQL Statement may related to your case. Default value is 1000000, and it is adjustable.
Except this I don't think any limitation existed for numbers of parameter of NOT IN clause.
With more than a few values to test for, you're better off putting them in a table that has an index on the column holding them. Then things like
SELECT *
FROM table
WHERE col_name NOT IN (SELECT value_col FROM value_table);
or
SELECT *
FROM table AS t
WHERE NOT EXISTS (SELECT 1 FROM value_table WHERE value_col = t.col_name);
will be reasonably efficient no matter how many records are in value_table because that index will be used to find entries.
Plus, of course, it makes it a lot easier to re-use prepared statements because you don't have to create a new one and re-bind every value (You are using prepared statements with placeholders for these values, right, and not trying to put their contents inline into a string?) every time you add a value to the ones you need to check. You just insert it into value_table instead.
Yes, there is a limit of 999 arguments as reported in the official documentation: https://www.sqlite.org/limits.html#max_variable_number
For an Android word game (with minSdkLevel=9 meaning SQLite version 3.6.22) -
I would like to deliver the dictionary as a prefilled SQLite table within the APK file (with the help of SQLiteAssetHelper).
In the SQLite database there will be just 1 table:
create table dict ( /* contains 700 000 unique words */
word text not null
);
My question please:
How to declare the table for the best performance and which kind of SQL-query to use?
(When checking if a word entered by player is present in the dict table or not - that will be the main usage of the SQLite database in the app).
Should I create index (is it possible to have index for text columns at all)?
Or should I declare the word column as primary key?
Also, some SQLite for Android guides suggest to have an _id column in each table (probably to enable fetching the last inserted record? - which I don't really need here). Should I maybe use
create table dict (
_id integer primary key,
word text unique not null
);
create index word_index on dict(word);
or will that be a waste of 4 x 700 000 bytes? (Or is it added as _rowid_ anyway?)
Quick answer: yes, you can create index on text column.
However for best performance, this may not be the best option.
Because the index created by SQLite should be simply a b-tree (binary tree), which speed up the search by binary search. i.e. with 700k words, the binary search has to run about 20 intervals. But this could be fast enough, you need to test it to actually know the performance.
Some alternative methods would be to create multiple tables (buckets), e.g. create table as wordA, wordB, wordC etc.
And use the first character to determine which table the word is put.
This drops the size of each table to contains about 27k records. (of course each bucket is not of equal size)
By doing this, it reduces the interval used performing the binary search.
And actually you should use hash function to determine the bucket, which makes the size of each buckets more balanced and you can freely control the number of buckets.
And you have to actually fine tune to know what is the optimal bucket size.
What is the best way to maintain a "cumulative sum" of a particular data column in SQLite? I have found several examples online, but I am not 100% certain how I might integrate these approaches into my ContentProvider.
In previous applications, I have tried to maintain cumulative data myself, updating the data each time I insert new data into the table. For example, in the sample code below, every time I would add a new record with a value score, I would then manually update the value of cumulative_score based on its value in the previous row.
_id score cumulative_score
1 100 100
2 50 150
3 25 175
4 25 200
5 10 210
However, this is far from ideal and becomes very messy when handling tables with many columns. Is there a way to somehow automate the process of updating cumulative data each time I insert/update records in my table? How might I integrate this into my ContentProvider implementation?
I know there must be a way to do this... I just don't know how. Thanks!
Probably the easiest way is with a SQLite trigger. That is the closest I know
of to "automation". Just have an insert trigger that takes the previous
cumulative sum, adds the current score and stores it in the new row's cumulative
sum. Something like this (assuming _id is the column you are ordering on):
CREATE TRIGGER calc_cumulative_score AFTER INSERT ON tablename FOR EACH ROW
BEGIN
UPDATE tablename SET cumulative_score =
(SELECT cumulative_score
FROM tablename
WHERE _id = (SELECT MAX(_id) FROM tablename))
+ new.score
WHERE _id = new._id;
END
Making sure that the trigger and the original insert are in the same
transaction. For arbitrary updates of the score column, you would have to
have to implement a recursive trigger that somehow finds the next highest id (maybe by selecting by the min id
in the set of rows with an id greater than the current one) and updates its
cumulative sum.
If you are opposed to using triggers, you can do more or less the same thing in
the ContentProvider in the insert and update methods manually, though since
you're pretty much locked into SQLite on Android, I don't see much reason not to
use triggers.
I assume you are wanting to do this as an optimization, as otherwise you could just calculate the sum on demand (O(n) vs O(1), so you'd have to consider how big n might get, and how often you need the sums).
I have a sqlite db that at the moment has few tables where the biggest one has over 10,000 rows. This table has four columns: id, term, definition, category. I have used a FTS3 module to speed up searching which helped a lot. However, now when I try to fetch 'next' or 'previous' row from table it takes longer than it was before I started using FTS3.
This is how I create virtual table:
CREATE VIRTUAL TABLE profanity USING fts3(_id integer primary key,name text,definition text,category text);
This is how I fetch next/previous rows:
SELECT * FROM dictionary WHERE _id < "+id + " ORDER BY _id DESC LIMIT 1
SELECT * FROM dictionary WHERE _id > "+id + " ORDER BY _id LIMIT 1
When I run these statements on the virtual table:
NEXT term is fetch within ~300ms,
PREVIOUS term is fetch within ~200ms
When I do it with normal table (the one created without FTS3):
NEXT term is fetch within ~3ms,
PREVIOUS term is fetch within ~2ms
Why there is such a big difference? Is there any way I can improve this speed?
EDITED:
I still can't get it to work!
Virtual table you've created is designed to provide full text queries. It's not aimed to fast processing standard queries using PK in where condition.
In this case there is no index on your _id column, so SQLite probably performs full table scan.
Next problem is your query - it's totally inefficient. Try something like this (untested):
SELECT * FROM dictionary WHERE _id = (select max(_id) from dictionary where _id < ?)
Next thing you can consider is redesign of your app. Instead of loading 1 row you, maybe you should get let's say 40, load them into memory and make background data loading when there is less than n to one of the ends. Long SQL operation will become invisible to user even if it'll last 3s instead of 0,3s
If you're running LIMIT 1 to begin with, you can remove the order by clause completely. This may help. I'm not familiar with FTS3, however.
You could also just flat out assign your id variable a ++ or -- and assert `WHERE _id = "+id+" LIMIT 1" which would make a single lookup instead of < or >.
Edit: and now that I look back at what I typed, if you do it that way, you can just remove LIMIT 1 completely, since your _id is your pk and must be unique.
hey look, a raw where clause!
I want to add the results of two separate counting SQlite queries. Suppose I have 2 tables named entries and scores and have 2 queries:
SELECT COUNT(1) FROM entries WHERE
key NOT IN (SELECT key FROM scores)
SELECT COUNT(1) FROM scores WHERE
value <= threshold
Maybe I could do something like this to get the sum of their results:
SELECT COUNT(1) from (
SELECT key FROM entries WHERE
key NOT IN (SELECT key FROM scores)
UNION ALL
SELECT key FROM scores WHERE
value <= threshold
)
But is this a little too inefficient? This is called pretty often and may interfere with the UI's smoothness.
Thank you.
[EDIT] What I'm actually trying to do:
I'm making an app to help learning vocabulary. The entries table keeps 'static' data about word type, definition, etc. The scores table keeps information about how well you've learned the words (e.g. performance, scheduled next review time)
To check for the number of remaining words to learn/review, I count how many words do not exist in the scores table yet (i.e. never touched) or when the accumulated score is pretty low (i.e. needs reviewing).
The reason I don't merge those 2 tables into 1 (which would make my life much easier) is because sometimes I need to update the entries table either by inserting new words, deleting a few words, or updating their content, and I haven't found a simple way to do that. If I simply do INSERT OR REPLACE, I will lose the information about scores.
I think you're looking for a UNION. A union combines the results from two queries. Try this (sorry it isn't tested, I don't have access to SQLite):
SELECT COUNT(1) FROM
(
SELECT 1
FROM entries
WHERE key NOT IN (SELECT key FROM scores)
UNION ALL
SELECT 1
FROM scores
WHERE scores.value <= threshold
)
After reading the edit in your question explaining what you need to do, I think a JOIN would be more appropriate. This is a way of combining two tables into one query. Something like this:
SELECT COUNT(1)
FROM entries
LEFT JOIN score
ON score.key = entries.key
WHERE score.value <= threshold
OR score.key is null