We have about 7-8 tables in our Android application each having about 8 columns on an average. Both read and write operations are performed on the database and I am experimenting and trying to find ways to enhance the performance of the DataAccess layer. So, far I have tried the following:
Use positional arguments in where clauses (Reason: so that sqlite makes use of the same execution plan)
Enclose inserts and update with transactions(Reason: every db operation is enclosed within a transaction by default. Doing this will remove that overhead)
Indexing: I have not created any explicit index other than those created by default on the primary key and unique keys columns.(Reason: indexing will improve seek time)
I have mentioned my assumptions in paranthesis; please correct me if I am wrong.
Questions:
Can I add anything else to this list? I read somewhere that avoiding the use of db-journal can improve performance of updates? Is this a myth or fact? How can this be done, if recomended?
Are nested transactions allowed in SQLite3? How do they affect performance?
The thing is I have a function which runs an update in a loop, so, i have enclosed the loop within a transaction block. Sometimes this function is called from another loop inside some other function. The calling function also encloses the loop within a transaction block. How does such a nesting of transactions affect performance?
The where clauses on my queries use more than one columns to build the predicate. These columns might not necessarily by a primary key or unique columns. Should I create indices on these columns too? Is it a good idea to create multiple indices for such a table?
Pin down exactly which queries you need to optimize. Grab a copy of a typical database and use the REPL to time queries. Use this to benchmark any gains as you optimize.
Use ANALYZE to allow SQLite's query planner to work more efficiently.
For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up. Use EXPLAIN QUERY PLAN on your queries to see which index would be used or if the query requires a full table scan. For large tables, a full table scan is bad and you probably want an index. Only one index will be used on any given query. If you have multiple predicates, then the index that will be used is the one that is expected to reduce the result set the most (based on ANALYZE). You can have indexes that contain multiple columns (to assist queries with multiple predicates). If you have indexes with multiple columns, they are usable only if the predicates fit the index from left to right with no gaps (but unused columns at the end are fine). If you use an ordering predicate (<, <=, > etc) then that needs to be in the last used column of the index. Using both WHERE predicates and ORDER BY both require an index and SQLite can only use one, so that can be a point where performance suffers. The more indexes you have, the slower your INSERTs will be, so you will have to work out the best trade-off for your situation.
If you have more complex queries that can't make use of any indexes that you might create, you can de-normalize your schema, structuring your data in such a way that the queries are simpler and can be answered using indexes.
If you are doing a large number of INSERTs, try dropping indexes and recreating them at the end. You will need to benchmark this.
SQLite does support nested transactions using savepoints, but I'm not sure that you'll gain anything there performance-wise.
You can gain lots of speed by compromising on data integrity. If you can recover from database corruption yourself, then this might work for you. You could perhaps only do this when you're doing intensive operations that you can recover from manually.
I'm not sure how much of this you can get to from an Android application. There is a more detailed guide for optimizing SQLite in general in the SQLite documentation.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app. I'm starting with an SQLiteOpenHelper dbHelper and an SQLiteQueryBuilder qb.
String sql = qb.buildQuery(projection,selection,selectionArgs,groupBy,having,sortOrder,limit);
android.util.Log.d("EXPLAIN",sql + "; " + java.util.Arrays.toString(selectionArgs));
Cursor c = dbHelper.getReadableDatabase().rawQuery("EXPLAIN QUERY PLAN " + sql,selectionArgs);
if(c.moveToFirst()) {
do {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < c.getColumnCount(); i++) {
sb.append(c.getColumnName(i)).append(":").append(c.getString(i)).append(", ");
}
android.util.Log.d("EXPLAIN",sb.toString());
} while(c.moveToNext());
}
c.close();
I dropped this into my ContentProvider.query() and now I can see exactly how all the queries are getting performed. (In my case it looks like the problem is too many queries rather than poor use of indexing; but maybe this will help someone else...)
I would add these :
Using of rawQuery() instead of building using ContentValues will fasten up in certain cases. off course it is a little tedious to write raw query.
If you have a lot of string / text type data, consider creating Virtual tables using full text search (FTS3), which can run faster query. you can search in google for the exact speed improvements.
A minor point to add to Robie's otherwise comprehensive answer: the VFS in SQLite (which is mostly concerned with locking) can be swapped out for alternatives. You may find one of the alternatives like unix-excl or unix-none to be faster but heed the warnings on the SQLite VFS page!
Normalization (of table structures) is also worth considering (if you haven't already) simply because it tends to provide the smallest representation of the data in the database; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases (the sort I'm most familiar with), but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.
Related
I have an SQLite DB where I perform a query like
Select * from table where col_name NOT IN ('val1','val2')
Basically I'm getting a huge list of values from server and I need to select the ones which is not present in the list given.
Currently its working fine, No issues. But the number of values from server becomes huge as the server DB is getting updated frequently.
So, I may get thousands of String values which I need to pass to the NOT IN
My question is, Will it cause any perfomance issue in the future? Does the NOT IN parameter have any size restriction? (like max 10000 values you can check)?
Will it cause any crash at some point?
This is an official reference about various limitation in sqlite. I think the Maximum Length Of An SQL Statement may related to your case. Default value is 1000000, and it is adjustable.
Except this I don't think any limitation existed for numbers of parameter of NOT IN clause.
With more than a few values to test for, you're better off putting them in a table that has an index on the column holding them. Then things like
SELECT *
FROM table
WHERE col_name NOT IN (SELECT value_col FROM value_table);
or
SELECT *
FROM table AS t
WHERE NOT EXISTS (SELECT 1 FROM value_table WHERE value_col = t.col_name);
will be reasonably efficient no matter how many records are in value_table because that index will be used to find entries.
Plus, of course, it makes it a lot easier to re-use prepared statements because you don't have to create a new one and re-bind every value (You are using prepared statements with placeholders for these values, right, and not trying to put their contents inline into a string?) every time you add a value to the ones you need to check. You just insert it into value_table instead.
Yes, there is a limit of 999 arguments as reported in the official documentation: https://www.sqlite.org/limits.html#max_variable_number
My understanding of SQLite transactions on Android is based largely on this article. In its gist it suggests that
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
.
That observation is correct - I started using transactions to fix just that issue:speed. In my own Android app I use a number of rather complex SQLite tables to store JSON data which I manipulate via the SQLite JSON1 extension - I use SQLCipher which has JSON1 built in.
At any given time I have to manipulate - insert, update or delete - rows in several tables. Given the complexity of the JSON I do this with the help of temporary tables I create for each table manipulation. The start of the manipulation begins with SQL along the lines of
DROP TABLE IF EXISTS h1;
CREATE TEMP TABLE h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
Some tables require just one table - which I usually call h1 - others need two in which case I call them h1 and h2.
The entire sequence of operations in any single set of manipulations takes the form
begin transaction
manipulate Table 1 which
which creates its own temp tables, h1[h2],
then extracts relevant existing JSON from Table 1 into the temps
manipulates h1[h2]
performs inserts, updates, deletes in Table 1
on to the next table, Table 2 where the same sequence is repeated
continue with a variable list of such tables - never more than 5
end transaction
My questions
does this sound like an efficient way to do things or would it be better to wrap each individual table operation in its own transaction?
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE calls. If I end up with h1[h2] temp tables that are pre-populated with data from manipulating Table(n - 1) when working with Table(n) then the updates on Table(n) will go totally wrong. I am assuming that the DROP TABLE bit I have is taking care of this issue. Am I right in assuming this?
I have to admit to not being an expert with SQL, even less so with SQLite and quite a newbie when it comes to using transactions. The SQLite JSON extension is very powerful but introduces a whole new level of complexity when manipulating data.
The main reason to use transactions is to reduce the overheads of writing to the disk.
So if you don't wrap multiple changes (inserts, deletes and updates) in a transaction then each will result in the database being written to disk and the overheads involved.
If you wrap them in a transaction and the in-memory version will be written only when the transaction is completed (note that if using the SQLiteDatabase beginTransaction/endTransaction methods, that you should, as part of ending the transaction use the setTransactionSuccessful method and then use the endTransaction method).
That is, the SQLiteDatabase method are is different to doing this via pure SQL when you'd begin the transaction and then end/commit it/them (i.e. the SQLiteDatabase methods would otherwise automatically rollback the transactions).
Saying that the statement :-
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
basically reiterates :-
Any command that changes the database (basically, any SQL command
other than SELECT) will automatically start a transaction if one is
not already in effect. Automatically started transactions are
committed when the last query finishes.
SQL As Understood By SQLite - BEGIN TRANSACTION i.e. it's not Android specific.
does this sound like an efficient way to do things or would it be
better to wrap each individual table operation in its own transaction?
Doing all the operations in a single transaction will be more efficient as there is just the single write to disk operation.
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE
calls. If I end up with h1[h2] temp tables that are pre-populated with
data from manipulating Table(n - 1) when working with Table(n) then
the updates on Table(n) will go totally wrong. I am assuming that the
DROP TABLE bit I have is taking care of this issue. Am I right in
assuming this?
Dropping the tables will ensure data integrity (i.e. you should, by the sound of it, do this), you could also use :-
CREATE TEMP TABLE IF NOT EXISTS h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
DELETE FROM h1;
I have sqlite database and content provider which wraps it. There is the table of dictionaries and the table of words. Eeach word belongs to one of dictionaries. Also each dictionariy has constant capacity. So my content provider should allow to insert only limited amount of words in each dictionary.
"scheme"
Dictionaries
|-id (read-only)
|-capacity
Words
|-id (read-only)
|-dictionaryId (write-once)
I have a few options:
1) For each new dictionary I can create trigger that will raise() error if amount of words is gather than capacity. But it will make query for each insertion that is redurant in some situations.
2) I can check this condition in the provider's insert(). (same problem as above)
3) I can pass this check to users of provider. For example check the condition in the activity which adds new words into dictionary. This is most optimized method because I don't need to make query each time I add new word. I can query the amount of words in the dictionary at the start of activity and then increment it and be aware of the relevant value without queries. But here I have another problem: what if i forget to check this condition or make mistake and condition won't work.
So what is the right way of checking conditions in content providers?
I've decided to extract all logic of working with data from database and use db just like mere container. The reason is that it frees me from doing tests for database that is not easy as I see. The better I think is to consider that database is something robust and based on that create tests for application components.
this is more of a question of theory than anything else. I am writing an android app that uses a pre-packaged database. The purpose of the app is solely to search through this database and return values. Ill provide some abstract examples to illustrate my implementation and quandary. The user can search by: "Thing Name," and what I want returned to the user is values a, b, and c. I initially designed the database to have it all contained on a single sheet, and have column 1 be key_index, column 2 be name, column 3 be a, etc etc. When the user searches, the cursor will return the key_index, and then use that to pull values a b and c.
However, in my database "Thing alpha" can have a value a = 4 or a = 6. I do not want to repeat data in the database, i.e. have multiple rows with the same thing alpha, only separate "a" values. So what is the best way to organize the data given this situation? Do I keep all the "Thing Names" in a single sheet, and all the data separately. This is really a question of proper database design, which is definitely something foreign to me. Thanks for your help!
There's a thing called database normalization http://en.wikipedia.org/wiki/Database_normalization. You usually want to avoid redundancy and dependency in the DB entities using a corresponding design with surrogate keys and foreign keys and so on. Your "thing aplpha" looks like you want to have a many-to-many table like e.g. one or many songs belong/s to the same or different genres. You may want to create dictionary tables to hold your id,name pairs and have foreign keys referencing these tables. In your case it will be mostly a read-only DB so you might want to consider creating indexes with high FILLFACTOR percentage don't think sqlite allows it to do though. There're many ways to design the database. Everything depends on the purpose of DB. You can start with a design of your hardware like raids/file systems/db block sizes to match the F-System's block sizes in order to keep the I/O optimal and where to put your tablespaces/filegroups/indexes to balance the i/o load. The whole DB design theory/task is really a deep subject which is not to be underestimated nor is a matter of few sentences in the answer of stackoverflow. :)
without understanding your data better here is my guess at what you are looking for.
table: product
- _id
- name
table: attribute
- product_id
- a
I maintain an application that is collecting a lot of information and is storing these information in an ArrayList.
In detail this ArrayList is defined as ArrayList<FileInformation> which has some member like:
private File mFile;
private Long mSize;
private int mCount;
private Long mFilteredSize;
private int mFilteredCount;
private int mNumberOfFilters;
etc.
This approach is working but is not very flexible when I would like to introduce some new functionality. It also has some limitations in terms of memory usage and scale-ability. Because of this I did some tests if a database is the better approach. From the flexibility there is no question, but somehow I'm not able to make it running fast enough to become a real alternative.
Right now the database has just one table like this:
CREATE TABLE ExtContent (
"path" TEXT not null,
"folderpath" TEXT not null,
"filename" TEXT,
"extention" TEXT,
"size" NUMERIC,
"filedate" NUMERIC,
"isfolder" INTEGER not null,
"firstfound" NUMERIC not null,
"lastfound" NUMERIC not null,
"filtered" INTEGER not null
);
The performance issue is immense. Collecting and writing ~14000 items takes ~3mins! when writing into the database and just 4-5secs if written into the ArrayList.
Creating the database in-memory does not make a big difference.
As my experience in terms of SQLITE is rather limited, I started by creating the entries via the android.database.sqlite.SQLiteDatabase.insert methode.
As there was no meaningful difference between a file based and a in-memory database, I guess using BEGIN TRANSACTION and COMMIT TRANSACTION will not make any difference.
Is there some way to optimize this behavior?
Just for clarification, putting BEGIN TRANSACTION and END TRANSACTION will increase the performance greatly. Quoted from http://www.sqlite.org/faq.html#q19 :
SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. By default, each INSERT statement is its own transaction...
I had a similar issue on an app I was coding on the weekend.
Is the data in the database to be included in the app when it's released? If so, bulk inserts aren't they way to go, instead you want to look at creating the database and including it in the assets directory and copying it over to the device. Here's a great link.
Otherwise I'm not sure you can do much to improve performance, this link explains methods on bulk inserting into an SqlLite Database.
Edit: You may also want to post your insert code too.
This is opretty obvious. Assuming you already allocated object to insert into. ( This is the same workload for bot solutions ) Let's compare alternatives:
Inserting in ArrayList does:
- (optional) allocate new chinks of cells for pointers if necessary
- insert object pointer into array list on the end
... really fast
INserting into sqlite:
-prepare insertion query ( I hope you use prepared query, and do not construct it from strings)
-perform database table insertion with modifications of indexes etc.
... a lot of work
Only advantage of database is that you can:
- query it later
- it handles external storage transparently allowing you to have much more entities
But it comes at cost of performance.
Depending on what you are for, there could be better alternatives.
For example, in my android games I store highscore entries in JSON file and utilise
GSON Pull parser / databinding layer ( https://github.com/ko5tik/jsonserializer ) to create objects out of it. Typical load time for 2000 entries from external storage is about 2-3 seconds