I have an SQLite DB where I perform a query like
Select * from table where col_name NOT IN ('val1','val2')
Basically I'm getting a huge list of values from server and I need to select the ones which is not present in the list given.
Currently its working fine, No issues. But the number of values from server becomes huge as the server DB is getting updated frequently.
So, I may get thousands of String values which I need to pass to the NOT IN
My question is, Will it cause any perfomance issue in the future? Does the NOT IN parameter have any size restriction? (like max 10000 values you can check)?
Will it cause any crash at some point?
This is an official reference about various limitation in sqlite. I think the Maximum Length Of An SQL Statement may related to your case. Default value is 1000000, and it is adjustable.
Except this I don't think any limitation existed for numbers of parameter of NOT IN clause.
With more than a few values to test for, you're better off putting them in a table that has an index on the column holding them. Then things like
SELECT *
FROM table
WHERE col_name NOT IN (SELECT value_col FROM value_table);
or
SELECT *
FROM table AS t
WHERE NOT EXISTS (SELECT 1 FROM value_table WHERE value_col = t.col_name);
will be reasonably efficient no matter how many records are in value_table because that index will be used to find entries.
Plus, of course, it makes it a lot easier to re-use prepared statements because you don't have to create a new one and re-bind every value (You are using prepared statements with placeholders for these values, right, and not trying to put their contents inline into a string?) every time you add a value to the ones you need to check. You just insert it into value_table instead.
Yes, there is a limit of 999 arguments as reported in the official documentation: https://www.sqlite.org/limits.html#max_variable_number
Related
I am developing dictionary application. It requires incremental search which means that SELECTING should be fast. There are 200000+ rows. Let me, first of all explain, table structure. I have this table:
CREATE TABLE meaning(
key TEXT,
value TEXT,
entries BLOB);
Some times ago I had this index:
CREATE INDEX index_key ON meaning (key)
This query was performed for around ~500ms which was very slow
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
Then I dropped this index, created incasesensitive index which helped to improve performance 2-3 times.
CREATE INDEX index_key ON meaning (key COLLATE NOCASE);
Now this query performing for 75ms(min) - 275(ms) which is quite slow for incremental search.
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
I have tried to optimize query according to this post.
SELECT value FROM meaning WHERE key >= 'boy' AND key<'boz' LIMIT 100
But this query is performed for 451ms.
EXPLAIN
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
This is returning following values:
EXPLAIN QUERY PLAN
SELECT value FROM meaning WHERE key LIKE 'boy%' LIMIT 100
This is returning this value(detail column):
SEARCH TABLE meaning USING INDEX index_key (key>? AND key<?) (~31250 rows)
Actually this values did not give me some sense or key what to optimize.
Is it possible to optimize SELECTion of words to be performed in ~10ms by optimization of this query or creating another table or changing some parameters of SQLite database? Could you suggest me the best way to do this?
PS. Please, do not suggest to use FTS table. In previous version of application I have used FTS. I agree that it is extremely fast. I left FTS table idea for 2 reasons:
It is not giving proper result(it contains the words which user do not need)
It takes more disk space
I've got two SQLite databases, each with a table that I need to keep synchronized by merging rows that have the same key. The tables are laid out like this:
CREATE TABLE titles ( name TEXT PRIMARY KEY,
chapter TEXT ,
page INTEGER DEFAULT 1 ,
updated INTEGER DEFAULT 0 );
I want to be able to run the same commands on each of the two tables, with the result that for pairs of rows with the same name, whichever row has the greater value in updated will overwrite the other row completely, and rows which do not have a match are copied across, so both tables are identical when finished.
This is for an Android app, so I could feasibly do the comparisons in Java, but I'd prefer an SQLite solution if possible. I'm not very experienced with SQL, so the more explanation you can give, the more it'll help.
EDIT
To clarify: I need something I can execute at an arbitrary time, to be invoked by other code. One of the two databases is not always present, and may not be completely intact when operations on the other occur, so I don't think a trigger will work.
Assuming that you have attached the other database to your main database:
ATTACH '/some/where/.../the/other/db-file' AS other;
you can first delete all records that are to be overwritten because their updated field is smaller than the corresponding updated field in the other table:
DELETE FROM main.titles
WHERE updated < (SELECT updated
FROM other.titles
WHERE other.titles.name = main.titles.name);
and then copy all newer and missing records:
INSERT INTO main.titles
SELECT * FROM other.titles
WHERE name NOT IN (SELECT name
FROM main.titles);
To update in the other direction, exchange the main/other database names.
For this, you can use a trigger.
i.e.
CREATE TRIGGER sync_trigger
AFTER INSERT OR UPDATE OF updated ON titles
REFERENCING NEW AS n
FOR EACH ROW
DECLARE updated_match;
DECLARE prime_name;
DECLARE max_updated;
BEGIN
SET prime_name = n.name;
ATTACH database2name AS db2;
SELECT updated
INTO updated_match
FROM db2.titles t
WHERE t.name=prime_name)
IF updated_match is not null THEN
IF n.updated > updated_match THEN
SET max_updated=n.updated;
ELSE
SET max_updated=updated_match;
END IF;
UPDATE titles
SET updated=max_updated
WHERE name=prime_name;
UPDATE db2.titles
SET updated=max_updated
WHERE name=prime_name;
END IF;
END sync_trigger;
The syntax may be a little off. I don't use triggers all that often and this is a fairly complex one, but it should give you an idea of where to start at least. You will need to assign this to one database, exchanging "database2name" for the other database's name and then assign it again to the other database, swapping the "database2name" out for the other database.
Hope this helps.
What is the best way to maintain a "cumulative sum" of a particular data column in SQLite? I have found several examples online, but I am not 100% certain how I might integrate these approaches into my ContentProvider.
In previous applications, I have tried to maintain cumulative data myself, updating the data each time I insert new data into the table. For example, in the sample code below, every time I would add a new record with a value score, I would then manually update the value of cumulative_score based on its value in the previous row.
_id score cumulative_score
1 100 100
2 50 150
3 25 175
4 25 200
5 10 210
However, this is far from ideal and becomes very messy when handling tables with many columns. Is there a way to somehow automate the process of updating cumulative data each time I insert/update records in my table? How might I integrate this into my ContentProvider implementation?
I know there must be a way to do this... I just don't know how. Thanks!
Probably the easiest way is with a SQLite trigger. That is the closest I know
of to "automation". Just have an insert trigger that takes the previous
cumulative sum, adds the current score and stores it in the new row's cumulative
sum. Something like this (assuming _id is the column you are ordering on):
CREATE TRIGGER calc_cumulative_score AFTER INSERT ON tablename FOR EACH ROW
BEGIN
UPDATE tablename SET cumulative_score =
(SELECT cumulative_score
FROM tablename
WHERE _id = (SELECT MAX(_id) FROM tablename))
+ new.score
WHERE _id = new._id;
END
Making sure that the trigger and the original insert are in the same
transaction. For arbitrary updates of the score column, you would have to
have to implement a recursive trigger that somehow finds the next highest id (maybe by selecting by the min id
in the set of rows with an id greater than the current one) and updates its
cumulative sum.
If you are opposed to using triggers, you can do more or less the same thing in
the ContentProvider in the insert and update methods manually, though since
you're pretty much locked into SQLite on Android, I don't see much reason not to
use triggers.
I assume you are wanting to do this as an optimization, as otherwise you could just calculate the sum on demand (O(n) vs O(1), so you'd have to consider how big n might get, and how often you need the sums).
I am fetching my data with id which is Integer primary key or integer.
But after deleting any row...
After that if we make select query to show all.
But it will give force close because one id is missing.
I want that id can itself take auto increment & decrement.
when i delete a record at the end(i.g. id=7) after this i add a row then id must be 7 not 8. as same when i delete a row in middle(i.g. id=3) then all the row auto specify by acceding.
your idea can help me.
Most systems with auto-incrementing columns keep track of the last value inserted (or the next one to be inserted) and do not ever reissue a number (give the same number twice), even if the last number issued has been removed from the table.
Judging from what you are asking, SQLite is another such system.
If there is any concurrency in the system, then this is risky, but for a single-user, single-app-at-a-time system, you might get away with:
SELECT MAX(id_column) + 1 FROM YourTable
to find the next available value. Depending on how SQLite behaves, you might be able to embed that in the VALUES list of an INSERT statement:
INSERT INTO YourTable(id_column, ...)
VALUES((SELECT MAX(id_column) + 1 FROM YourTable), ...);
That may not work; you may have to do this as two operations. Note that if there is any concurrency, the two statement form is a bad ideaTM. The primary key unique constraint normally prevents disaster, but one of two concurrent statements fails because it tries to insert a value that the other just inserted - so it has to retry and hope for the best. Clearly, a cell phone has less concurrency than, say, a web server so the problem is correspondingly less severe. But be careful.
On the whole, though, it is best to let gaps appear in the sequence without worrying about it. It is usually not necessary to worry about them. If you must worry about gaps, don't let people make them in the first place. Or move an existing row to fill in the gap when you do a delete that creates one. That still leaves deletes at the end creating gaps when new rows are added, which is why it is best to get over the "it must be a contiguous sequence of numbers" mentality. Auto-increment guarantees uniqueness; it does not guarantee contiguity.
We have about 7-8 tables in our Android application each having about 8 columns on an average. Both read and write operations are performed on the database and I am experimenting and trying to find ways to enhance the performance of the DataAccess layer. So, far I have tried the following:
Use positional arguments in where clauses (Reason: so that sqlite makes use of the same execution plan)
Enclose inserts and update with transactions(Reason: every db operation is enclosed within a transaction by default. Doing this will remove that overhead)
Indexing: I have not created any explicit index other than those created by default on the primary key and unique keys columns.(Reason: indexing will improve seek time)
I have mentioned my assumptions in paranthesis; please correct me if I am wrong.
Questions:
Can I add anything else to this list? I read somewhere that avoiding the use of db-journal can improve performance of updates? Is this a myth or fact? How can this be done, if recomended?
Are nested transactions allowed in SQLite3? How do they affect performance?
The thing is I have a function which runs an update in a loop, so, i have enclosed the loop within a transaction block. Sometimes this function is called from another loop inside some other function. The calling function also encloses the loop within a transaction block. How does such a nesting of transactions affect performance?
The where clauses on my queries use more than one columns to build the predicate. These columns might not necessarily by a primary key or unique columns. Should I create indices on these columns too? Is it a good idea to create multiple indices for such a table?
Pin down exactly which queries you need to optimize. Grab a copy of a typical database and use the REPL to time queries. Use this to benchmark any gains as you optimize.
Use ANALYZE to allow SQLite's query planner to work more efficiently.
For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up. Use EXPLAIN QUERY PLAN on your queries to see which index would be used or if the query requires a full table scan. For large tables, a full table scan is bad and you probably want an index. Only one index will be used on any given query. If you have multiple predicates, then the index that will be used is the one that is expected to reduce the result set the most (based on ANALYZE). You can have indexes that contain multiple columns (to assist queries with multiple predicates). If you have indexes with multiple columns, they are usable only if the predicates fit the index from left to right with no gaps (but unused columns at the end are fine). If you use an ordering predicate (<, <=, > etc) then that needs to be in the last used column of the index. Using both WHERE predicates and ORDER BY both require an index and SQLite can only use one, so that can be a point where performance suffers. The more indexes you have, the slower your INSERTs will be, so you will have to work out the best trade-off for your situation.
If you have more complex queries that can't make use of any indexes that you might create, you can de-normalize your schema, structuring your data in such a way that the queries are simpler and can be answered using indexes.
If you are doing a large number of INSERTs, try dropping indexes and recreating them at the end. You will need to benchmark this.
SQLite does support nested transactions using savepoints, but I'm not sure that you'll gain anything there performance-wise.
You can gain lots of speed by compromising on data integrity. If you can recover from database corruption yourself, then this might work for you. You could perhaps only do this when you're doing intensive operations that you can recover from manually.
I'm not sure how much of this you can get to from an Android application. There is a more detailed guide for optimizing SQLite in general in the SQLite documentation.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app. I'm starting with an SQLiteOpenHelper dbHelper and an SQLiteQueryBuilder qb.
String sql = qb.buildQuery(projection,selection,selectionArgs,groupBy,having,sortOrder,limit);
android.util.Log.d("EXPLAIN",sql + "; " + java.util.Arrays.toString(selectionArgs));
Cursor c = dbHelper.getReadableDatabase().rawQuery("EXPLAIN QUERY PLAN " + sql,selectionArgs);
if(c.moveToFirst()) {
do {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < c.getColumnCount(); i++) {
sb.append(c.getColumnName(i)).append(":").append(c.getString(i)).append(", ");
}
android.util.Log.d("EXPLAIN",sb.toString());
} while(c.moveToNext());
}
c.close();
I dropped this into my ContentProvider.query() and now I can see exactly how all the queries are getting performed. (In my case it looks like the problem is too many queries rather than poor use of indexing; but maybe this will help someone else...)
I would add these :
Using of rawQuery() instead of building using ContentValues will fasten up in certain cases. off course it is a little tedious to write raw query.
If you have a lot of string / text type data, consider creating Virtual tables using full text search (FTS3), which can run faster query. you can search in google for the exact speed improvements.
A minor point to add to Robie's otherwise comprehensive answer: the VFS in SQLite (which is mostly concerned with locking) can be swapped out for alternatives. You may find one of the alternatives like unix-excl or unix-none to be faster but heed the warnings on the SQLite VFS page!
Normalization (of table structures) is also worth considering (if you haven't already) simply because it tends to provide the smallest representation of the data in the database; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases (the sort I'm most familiar with), but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.