I have been spending quite some time looking at some performance issues on our device, and noticed that we have quite a few apps all doing db reads/writes..
I started by using the Contacts API to insert new contacts & data rows, and it was painfully slow. 1 minute 18 seconds to insert about 1500 rows (250 raw contacts & 1250 data rows)..
I had used the insert helper in another app for performance inserts, and decided to write a test app which would write to separate db's w/ separate insert methods.
Each db has one table, each w/ 4 columns :
_ID, Name, Time, and Blob (all of type 'string') - just like contact provider defines the data columns.
_ID is auto increment pk, Name just inserts the same thing '1234567890', time is just the current system time in milis, and BLob is a string w/ length 6400 full of the letter 'A'...
I first checked the bulk insert, but all it does is loops through all the inserts you have defined, and is just as slow as doing the inserts individually (or negligible performance impact)..
I tested 3 different methods to do the inserts :
ContentValues w/ db.insert method :
SQLiteStatement w/ statement.execute() (done inside a transaction).
SqliteInsertHelper w/ transaction.
I can provide some code, but I got the best performance out of the InsertHelper, and wondering why it was deprecated :
Time to insert 100 records
ContentValues : 7.778 seconds ( 82 bytes written / ms )
SQLiteStatement : 1.311 seconds ( 489 bytes written / ms )
SqliteInsertHElper : 0.292 seconds (2197 bytes written / ms)
Any ideas?
It's hard to come by any information on why InsertHelper was deprecated without going to the actual commit that does deprecate it. The engineer that deprecated InsertHelper gave the following reason:
This class does not offer any advantages over SQLiteStatement and just makes code more complex and error-prone.
After refactoring from InsertHelper to SQLiteStatement I agree. One exception are for null-safe binding functions. Whereas InsertHelper automatically calls bindNull() for you, SQLiteStatement crashes if you pass, for example, a null String and you have to do your own null check before calling bindString().
See:
https://android.googlesource.com/platform/frameworks/base/+/b33eb4e%5E!/
InsertHelper allows users to do multiple inserts into a table using the same statement.But it is not a good way to insert as such as it is not thread-safe.
You should use transactions.. If you don't explicitly create a transaction for a database operation the framework creates one for each. Group your object together and insert them all at once. This will greatly increase performance.
Related
My understanding of SQLite transactions on Android is based largely on this article. In its gist it suggests that
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
.
That observation is correct - I started using transactions to fix just that issue:speed. In my own Android app I use a number of rather complex SQLite tables to store JSON data which I manipulate via the SQLite JSON1 extension - I use SQLCipher which has JSON1 built in.
At any given time I have to manipulate - insert, update or delete - rows in several tables. Given the complexity of the JSON I do this with the help of temporary tables I create for each table manipulation. The start of the manipulation begins with SQL along the lines of
DROP TABLE IF EXISTS h1;
CREATE TEMP TABLE h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
Some tables require just one table - which I usually call h1 - others need two in which case I call them h1 and h2.
The entire sequence of operations in any single set of manipulations takes the form
begin transaction
manipulate Table 1 which
which creates its own temp tables, h1[h2],
then extracts relevant existing JSON from Table 1 into the temps
manipulates h1[h2]
performs inserts, updates, deletes in Table 1
on to the next table, Table 2 where the same sequence is repeated
continue with a variable list of such tables - never more than 5
end transaction
My questions
does this sound like an efficient way to do things or would it be better to wrap each individual table operation in its own transaction?
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE calls. If I end up with h1[h2] temp tables that are pre-populated with data from manipulating Table(n - 1) when working with Table(n) then the updates on Table(n) will go totally wrong. I am assuming that the DROP TABLE bit I have is taking care of this issue. Am I right in assuming this?
I have to admit to not being an expert with SQL, even less so with SQLite and quite a newbie when it comes to using transactions. The SQLite JSON extension is very powerful but introduces a whole new level of complexity when manipulating data.
The main reason to use transactions is to reduce the overheads of writing to the disk.
So if you don't wrap multiple changes (inserts, deletes and updates) in a transaction then each will result in the database being written to disk and the overheads involved.
If you wrap them in a transaction and the in-memory version will be written only when the transaction is completed (note that if using the SQLiteDatabase beginTransaction/endTransaction methods, that you should, as part of ending the transaction use the setTransactionSuccessful method and then use the endTransaction method).
That is, the SQLiteDatabase method are is different to doing this via pure SQL when you'd begin the transaction and then end/commit it/them (i.e. the SQLiteDatabase methods would otherwise automatically rollback the transactions).
Saying that the statement :-
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
basically reiterates :-
Any command that changes the database (basically, any SQL command
other than SELECT) will automatically start a transaction if one is
not already in effect. Automatically started transactions are
committed when the last query finishes.
SQL As Understood By SQLite - BEGIN TRANSACTION i.e. it's not Android specific.
does this sound like an efficient way to do things or would it be
better to wrap each individual table operation in its own transaction?
Doing all the operations in a single transaction will be more efficient as there is just the single write to disk operation.
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE
calls. If I end up with h1[h2] temp tables that are pre-populated with
data from manipulating Table(n - 1) when working with Table(n) then
the updates on Table(n) will go totally wrong. I am assuming that the
DROP TABLE bit I have is taking care of this issue. Am I right in
assuming this?
Dropping the tables will ensure data integrity (i.e. you should, by the sound of it, do this), you could also use :-
CREATE TEMP TABLE IF NOT EXISTS h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
DELETE FROM h1;
I successfully used the following BEFORE INSERT trigger to limit the number of rows stored in the SQLite database table locations. The database table acts as a cache in an Android application.
CREATE TRIGGER 'trigger_locations_insert'
BEFORE INSERT ON 'locations'
WHEN ( SELECT count(*) FROM 'locations' ) > '100'
BEGIN
DELETE FROM 'locations' WHERE '_id' NOT IN
(
SELECT '_id' FROM 'locations' ORDER BY 'modified_at' DESC LIMIT '100'
);
END
Meanwhile, I added a second trigger that allows me to INSERT OR UPDATE rows. - The discussion on that topic can be found in another thread. The second trigger requires a VIEW on which each INSERTis executed.
CREATE VIEW 'locations_view' AS SELECT * FROM 'locations';
Since an INSERT is no longer executed on the TABLE locations but on the VIEW locations_view, the above trigger does no longer work. If I apply the trigger on the VIEW the following error message is thrown.
Failure 1 (cannot create BEFORE trigger on view: main.locations_view)
Question:
How can I change the above trigger to observe each INSERT on the VIEW - or do you recommend another way to limit the number of rows? I would prefer to handle this kind of operation within the database, rather then running frontend code on my client.
Performance issues:
Although, the limiter (the above trigger) works in general - it performs less then optimal! Actually, the database actions take so long that an ANR is raised. As far as I can see, the reason is, that the limiter is called every time an INSERT happens. To optimize the setup, the bulk INSERT should be wrapped into a transaction and the limiter should perform right after. Is this possible? If you like to help, please place optimization comments concerning the bulk INSERT into the original question. Comments regarding the limiter are welcome here.
This type of trigger should work fine in conjunction with the other one. The problem appears to be that the SQL is unnecessarily quoting the _id field. It is selecting the literal string "_id" for every row and comparing that to the same literal string.
Removing the quotes around '_id' (both in the DELETE and in the sub-SELECT) should fix the problem.
What is the best way to maintain a "cumulative sum" of a particular data column in SQLite? I have found several examples online, but I am not 100% certain how I might integrate these approaches into my ContentProvider.
In previous applications, I have tried to maintain cumulative data myself, updating the data each time I insert new data into the table. For example, in the sample code below, every time I would add a new record with a value score, I would then manually update the value of cumulative_score based on its value in the previous row.
_id score cumulative_score
1 100 100
2 50 150
3 25 175
4 25 200
5 10 210
However, this is far from ideal and becomes very messy when handling tables with many columns. Is there a way to somehow automate the process of updating cumulative data each time I insert/update records in my table? How might I integrate this into my ContentProvider implementation?
I know there must be a way to do this... I just don't know how. Thanks!
Probably the easiest way is with a SQLite trigger. That is the closest I know
of to "automation". Just have an insert trigger that takes the previous
cumulative sum, adds the current score and stores it in the new row's cumulative
sum. Something like this (assuming _id is the column you are ordering on):
CREATE TRIGGER calc_cumulative_score AFTER INSERT ON tablename FOR EACH ROW
BEGIN
UPDATE tablename SET cumulative_score =
(SELECT cumulative_score
FROM tablename
WHERE _id = (SELECT MAX(_id) FROM tablename))
+ new.score
WHERE _id = new._id;
END
Making sure that the trigger and the original insert are in the same
transaction. For arbitrary updates of the score column, you would have to
have to implement a recursive trigger that somehow finds the next highest id (maybe by selecting by the min id
in the set of rows with an id greater than the current one) and updates its
cumulative sum.
If you are opposed to using triggers, you can do more or less the same thing in
the ContentProvider in the insert and update methods manually, though since
you're pretty much locked into SQLite on Android, I don't see much reason not to
use triggers.
I assume you are wanting to do this as an optimization, as otherwise you could just calculate the sum on demand (O(n) vs O(1), so you'd have to consider how big n might get, and how often you need the sums).
I maintain an application that is collecting a lot of information and is storing these information in an ArrayList.
In detail this ArrayList is defined as ArrayList<FileInformation> which has some member like:
private File mFile;
private Long mSize;
private int mCount;
private Long mFilteredSize;
private int mFilteredCount;
private int mNumberOfFilters;
etc.
This approach is working but is not very flexible when I would like to introduce some new functionality. It also has some limitations in terms of memory usage and scale-ability. Because of this I did some tests if a database is the better approach. From the flexibility there is no question, but somehow I'm not able to make it running fast enough to become a real alternative.
Right now the database has just one table like this:
CREATE TABLE ExtContent (
"path" TEXT not null,
"folderpath" TEXT not null,
"filename" TEXT,
"extention" TEXT,
"size" NUMERIC,
"filedate" NUMERIC,
"isfolder" INTEGER not null,
"firstfound" NUMERIC not null,
"lastfound" NUMERIC not null,
"filtered" INTEGER not null
);
The performance issue is immense. Collecting and writing ~14000 items takes ~3mins! when writing into the database and just 4-5secs if written into the ArrayList.
Creating the database in-memory does not make a big difference.
As my experience in terms of SQLITE is rather limited, I started by creating the entries via the android.database.sqlite.SQLiteDatabase.insert methode.
As there was no meaningful difference between a file based and a in-memory database, I guess using BEGIN TRANSACTION and COMMIT TRANSACTION will not make any difference.
Is there some way to optimize this behavior?
Just for clarification, putting BEGIN TRANSACTION and END TRANSACTION will increase the performance greatly. Quoted from http://www.sqlite.org/faq.html#q19 :
SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. By default, each INSERT statement is its own transaction...
I had a similar issue on an app I was coding on the weekend.
Is the data in the database to be included in the app when it's released? If so, bulk inserts aren't they way to go, instead you want to look at creating the database and including it in the assets directory and copying it over to the device. Here's a great link.
Otherwise I'm not sure you can do much to improve performance, this link explains methods on bulk inserting into an SqlLite Database.
Edit: You may also want to post your insert code too.
This is opretty obvious. Assuming you already allocated object to insert into. ( This is the same workload for bot solutions ) Let's compare alternatives:
Inserting in ArrayList does:
- (optional) allocate new chinks of cells for pointers if necessary
- insert object pointer into array list on the end
... really fast
INserting into sqlite:
-prepare insertion query ( I hope you use prepared query, and do not construct it from strings)
-perform database table insertion with modifications of indexes etc.
... a lot of work
Only advantage of database is that you can:
- query it later
- it handles external storage transparently allowing you to have much more entities
But it comes at cost of performance.
Depending on what you are for, there could be better alternatives.
For example, in my android games I store highscore entries in JSON file and utilise
GSON Pull parser / databinding layer ( https://github.com/ko5tik/jsonserializer ) to create objects out of it. Typical load time for 2000 entries from external storage is about 2-3 seconds
We have about 7-8 tables in our Android application each having about 8 columns on an average. Both read and write operations are performed on the database and I am experimenting and trying to find ways to enhance the performance of the DataAccess layer. So, far I have tried the following:
Use positional arguments in where clauses (Reason: so that sqlite makes use of the same execution plan)
Enclose inserts and update with transactions(Reason: every db operation is enclosed within a transaction by default. Doing this will remove that overhead)
Indexing: I have not created any explicit index other than those created by default on the primary key and unique keys columns.(Reason: indexing will improve seek time)
I have mentioned my assumptions in paranthesis; please correct me if I am wrong.
Questions:
Can I add anything else to this list? I read somewhere that avoiding the use of db-journal can improve performance of updates? Is this a myth or fact? How can this be done, if recomended?
Are nested transactions allowed in SQLite3? How do they affect performance?
The thing is I have a function which runs an update in a loop, so, i have enclosed the loop within a transaction block. Sometimes this function is called from another loop inside some other function. The calling function also encloses the loop within a transaction block. How does such a nesting of transactions affect performance?
The where clauses on my queries use more than one columns to build the predicate. These columns might not necessarily by a primary key or unique columns. Should I create indices on these columns too? Is it a good idea to create multiple indices for such a table?
Pin down exactly which queries you need to optimize. Grab a copy of a typical database and use the REPL to time queries. Use this to benchmark any gains as you optimize.
Use ANALYZE to allow SQLite's query planner to work more efficiently.
For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up. Use EXPLAIN QUERY PLAN on your queries to see which index would be used or if the query requires a full table scan. For large tables, a full table scan is bad and you probably want an index. Only one index will be used on any given query. If you have multiple predicates, then the index that will be used is the one that is expected to reduce the result set the most (based on ANALYZE). You can have indexes that contain multiple columns (to assist queries with multiple predicates). If you have indexes with multiple columns, they are usable only if the predicates fit the index from left to right with no gaps (but unused columns at the end are fine). If you use an ordering predicate (<, <=, > etc) then that needs to be in the last used column of the index. Using both WHERE predicates and ORDER BY both require an index and SQLite can only use one, so that can be a point where performance suffers. The more indexes you have, the slower your INSERTs will be, so you will have to work out the best trade-off for your situation.
If you have more complex queries that can't make use of any indexes that you might create, you can de-normalize your schema, structuring your data in such a way that the queries are simpler and can be answered using indexes.
If you are doing a large number of INSERTs, try dropping indexes and recreating them at the end. You will need to benchmark this.
SQLite does support nested transactions using savepoints, but I'm not sure that you'll gain anything there performance-wise.
You can gain lots of speed by compromising on data integrity. If you can recover from database corruption yourself, then this might work for you. You could perhaps only do this when you're doing intensive operations that you can recover from manually.
I'm not sure how much of this you can get to from an Android application. There is a more detailed guide for optimizing SQLite in general in the SQLite documentation.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app. I'm starting with an SQLiteOpenHelper dbHelper and an SQLiteQueryBuilder qb.
String sql = qb.buildQuery(projection,selection,selectionArgs,groupBy,having,sortOrder,limit);
android.util.Log.d("EXPLAIN",sql + "; " + java.util.Arrays.toString(selectionArgs));
Cursor c = dbHelper.getReadableDatabase().rawQuery("EXPLAIN QUERY PLAN " + sql,selectionArgs);
if(c.moveToFirst()) {
do {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < c.getColumnCount(); i++) {
sb.append(c.getColumnName(i)).append(":").append(c.getString(i)).append(", ");
}
android.util.Log.d("EXPLAIN",sb.toString());
} while(c.moveToNext());
}
c.close();
I dropped this into my ContentProvider.query() and now I can see exactly how all the queries are getting performed. (In my case it looks like the problem is too many queries rather than poor use of indexing; but maybe this will help someone else...)
I would add these :
Using of rawQuery() instead of building using ContentValues will fasten up in certain cases. off course it is a little tedious to write raw query.
If you have a lot of string / text type data, consider creating Virtual tables using full text search (FTS3), which can run faster query. you can search in google for the exact speed improvements.
A minor point to add to Robie's otherwise comprehensive answer: the VFS in SQLite (which is mostly concerned with locking) can be swapped out for alternatives. You may find one of the alternatives like unix-excl or unix-none to be faster but heed the warnings on the SQLite VFS page!
Normalization (of table structures) is also worth considering (if you haven't already) simply because it tends to provide the smallest representation of the data in the database; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases (the sort I'm most familiar with), but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.