My understanding of SQLite transactions on Android is based largely on this article. In its gist it suggests that
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
.
That observation is correct - I started using transactions to fix just that issue:speed. In my own Android app I use a number of rather complex SQLite tables to store JSON data which I manipulate via the SQLite JSON1 extension - I use SQLCipher which has JSON1 built in.
At any given time I have to manipulate - insert, update or delete - rows in several tables. Given the complexity of the JSON I do this with the help of temporary tables I create for each table manipulation. The start of the manipulation begins with SQL along the lines of
DROP TABLE IF EXISTS h1;
CREATE TEMP TABLE h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
Some tables require just one table - which I usually call h1 - others need two in which case I call them h1 and h2.
The entire sequence of operations in any single set of manipulations takes the form
begin transaction
manipulate Table 1 which
which creates its own temp tables, h1[h2],
then extracts relevant existing JSON from Table 1 into the temps
manipulates h1[h2]
performs inserts, updates, deletes in Table 1
on to the next table, Table 2 where the same sequence is repeated
continue with a variable list of such tables - never more than 5
end transaction
My questions
does this sound like an efficient way to do things or would it be better to wrap each individual table operation in its own transaction?
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE calls. If I end up with h1[h2] temp tables that are pre-populated with data from manipulating Table(n - 1) when working with Table(n) then the updates on Table(n) will go totally wrong. I am assuming that the DROP TABLE bit I have is taking care of this issue. Am I right in assuming this?
I have to admit to not being an expert with SQL, even less so with SQLite and quite a newbie when it comes to using transactions. The SQLite JSON extension is very powerful but introduces a whole new level of complexity when manipulating data.
The main reason to use transactions is to reduce the overheads of writing to the disk.
So if you don't wrap multiple changes (inserts, deletes and updates) in a transaction then each will result in the database being written to disk and the overheads involved.
If you wrap them in a transaction and the in-memory version will be written only when the transaction is completed (note that if using the SQLiteDatabase beginTransaction/endTransaction methods, that you should, as part of ending the transaction use the setTransactionSuccessful method and then use the endTransaction method).
That is, the SQLiteDatabase method are is different to doing this via pure SQL when you'd begin the transaction and then end/commit it/them (i.e. the SQLiteDatabase methods would otherwise automatically rollback the transactions).
Saying that the statement :-
if you do not wrap calls to SQLite in an explicit transaction it will
create an implicit transaction for you. A consequence of these
implicit transactions is a loss of speed
basically reiterates :-
Any command that changes the database (basically, any SQL command
other than SELECT) will automatically start a transaction if one is
not already in effect. Automatically started transactions are
committed when the last query finishes.
SQL As Understood By SQLite - BEGIN TRANSACTION i.e. it's not Android specific.
does this sound like an efficient way to do things or would it be
better to wrap each individual table operation in its own transaction?
Doing all the operations in a single transaction will be more efficient as there is just the single write to disk operation.
it is not clear to me what happens to my DROP TABLE/CREATE TEMP TABLE
calls. If I end up with h1[h2] temp tables that are pre-populated with
data from manipulating Table(n - 1) when working with Table(n) then
the updates on Table(n) will go totally wrong. I am assuming that the
DROP TABLE bit I have is taking care of this issue. Am I right in
assuming this?
Dropping the tables will ensure data integrity (i.e. you should, by the sound of it, do this), you could also use :-
CREATE TEMP TABLE IF NOT EXISTS h1(v1 TEXT,v2 TEXT,v3 TEXT,v4 TEXT,v5 TEXT);
DELETE FROM h1;
Related
I am working on an android application using android:minSdkVersion="14". The application receives data as JSON from a server. The data received need to be added to an sqlite table. If a row exists, all fields except for two have to be updated. If a row does not already exist in the table, it has to be inserted. I am looking for the most efficient way as regards performance.
The function insertwithonCoflict() has been considered but it is not an option since in case of update, it updates all the fields including the two that should not be updated.
The function replace() is also not suitable.
I would opt for a SELECT to check if the row exists and then an INSERT or UPDATE but I was wondering if I could optimize the procedure somehow .
Two approaches:
Change the database structure so that the table has only the server data. Put local data (the two columns) in another table that references the server data table. When updating, just insert to the server data table with "replace" conflict resolution.
Do the select-insert/update logic.
For performance in any case, use database transactions to reduce I/O. That is, wrap the database update loop in a transaction and only commit it when you've done with everything. (In case the transaction becomes too large, split the loop into transaction chunks of maybe a few thousand rows.)
A nice solution I use is as follows:
long id = db.insertWithOnConflict(TABLE, null, contentValues, SQLiteDatabase.CONFLICT_IGNORE);
if(id!=-1) db.update(TABLE, contentValues, "_id=?", new String[]{String.valueOf(id)});
This ensures the row exists and has the latest values.
I successfully used the following BEFORE INSERT trigger to limit the number of rows stored in the SQLite database table locations. The database table acts as a cache in an Android application.
CREATE TRIGGER 'trigger_locations_insert'
BEFORE INSERT ON 'locations'
WHEN ( SELECT count(*) FROM 'locations' ) > '100'
BEGIN
DELETE FROM 'locations' WHERE '_id' NOT IN
(
SELECT '_id' FROM 'locations' ORDER BY 'modified_at' DESC LIMIT '100'
);
END
Meanwhile, I added a second trigger that allows me to INSERT OR UPDATE rows. - The discussion on that topic can be found in another thread. The second trigger requires a VIEW on which each INSERTis executed.
CREATE VIEW 'locations_view' AS SELECT * FROM 'locations';
Since an INSERT is no longer executed on the TABLE locations but on the VIEW locations_view, the above trigger does no longer work. If I apply the trigger on the VIEW the following error message is thrown.
Failure 1 (cannot create BEFORE trigger on view: main.locations_view)
Question:
How can I change the above trigger to observe each INSERT on the VIEW - or do you recommend another way to limit the number of rows? I would prefer to handle this kind of operation within the database, rather then running frontend code on my client.
Performance issues:
Although, the limiter (the above trigger) works in general - it performs less then optimal! Actually, the database actions take so long that an ANR is raised. As far as I can see, the reason is, that the limiter is called every time an INSERT happens. To optimize the setup, the bulk INSERT should be wrapped into a transaction and the limiter should perform right after. Is this possible? If you like to help, please place optimization comments concerning the bulk INSERT into the original question. Comments regarding the limiter are welcome here.
This type of trigger should work fine in conjunction with the other one. The problem appears to be that the SQL is unnecessarily quoting the _id field. It is selecting the literal string "_id" for every row and comparing that to the same literal string.
Removing the quotes around '_id' (both in the DELETE and in the sub-SELECT) should fix the problem.
I maintain an application that is collecting a lot of information and is storing these information in an ArrayList.
In detail this ArrayList is defined as ArrayList<FileInformation> which has some member like:
private File mFile;
private Long mSize;
private int mCount;
private Long mFilteredSize;
private int mFilteredCount;
private int mNumberOfFilters;
etc.
This approach is working but is not very flexible when I would like to introduce some new functionality. It also has some limitations in terms of memory usage and scale-ability. Because of this I did some tests if a database is the better approach. From the flexibility there is no question, but somehow I'm not able to make it running fast enough to become a real alternative.
Right now the database has just one table like this:
CREATE TABLE ExtContent (
"path" TEXT not null,
"folderpath" TEXT not null,
"filename" TEXT,
"extention" TEXT,
"size" NUMERIC,
"filedate" NUMERIC,
"isfolder" INTEGER not null,
"firstfound" NUMERIC not null,
"lastfound" NUMERIC not null,
"filtered" INTEGER not null
);
The performance issue is immense. Collecting and writing ~14000 items takes ~3mins! when writing into the database and just 4-5secs if written into the ArrayList.
Creating the database in-memory does not make a big difference.
As my experience in terms of SQLITE is rather limited, I started by creating the entries via the android.database.sqlite.SQLiteDatabase.insert methode.
As there was no meaningful difference between a file based and a in-memory database, I guess using BEGIN TRANSACTION and COMMIT TRANSACTION will not make any difference.
Is there some way to optimize this behavior?
Just for clarification, putting BEGIN TRANSACTION and END TRANSACTION will increase the performance greatly. Quoted from http://www.sqlite.org/faq.html#q19 :
SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. By default, each INSERT statement is its own transaction...
I had a similar issue on an app I was coding on the weekend.
Is the data in the database to be included in the app when it's released? If so, bulk inserts aren't they way to go, instead you want to look at creating the database and including it in the assets directory and copying it over to the device. Here's a great link.
Otherwise I'm not sure you can do much to improve performance, this link explains methods on bulk inserting into an SqlLite Database.
Edit: You may also want to post your insert code too.
This is opretty obvious. Assuming you already allocated object to insert into. ( This is the same workload for bot solutions ) Let's compare alternatives:
Inserting in ArrayList does:
- (optional) allocate new chinks of cells for pointers if necessary
- insert object pointer into array list on the end
... really fast
INserting into sqlite:
-prepare insertion query ( I hope you use prepared query, and do not construct it from strings)
-perform database table insertion with modifications of indexes etc.
... a lot of work
Only advantage of database is that you can:
- query it later
- it handles external storage transparently allowing you to have much more entities
But it comes at cost of performance.
Depending on what you are for, there could be better alternatives.
For example, in my android games I store highscore entries in JSON file and utilise
GSON Pull parser / databinding layer ( https://github.com/ko5tik/jsonserializer ) to create objects out of it. Typical load time for 2000 entries from external storage is about 2-3 seconds
I currently successfully use a SQLite database which is populated with data from the web. I create an array of values and add these as a row to the database.
Currently to update the database, on starting the activity I clear the database and repopulate it using the data from the web.
Is there an easy method to do one of the following?
A: Only update a row in the table if data has changed (I'm not sure how I could do this unless there was a consistent primary key - what would happen is a new row would be added with the changed data, however there would be no way to know which of the old rows to remove)
B: get all of the rows of data from the web, then empty and fill the database in one go rather than after getting each row
I hope this makes sense. I can provide my code but I don't think it's especially useful for this example.
Context:
On starting the activity, the database is scanned to retrieve values for a different task. However, this takes longer than it needs to because the database is emptied and refilled slowly. Therefore the task can only complete when the database is fully repopulated.
In an ideal world, the database would be scanned and values used for the task, and that database would only be replaced when the complete set of updated data is available.
Your main concern with approach (b) - clearing out all data and slowly repopulating - seems to be that any query between the empty and completion of the refill would need to be refused.
You could simply put the empty/repopulate process in a transaction. Thereby the database will always have data to offer for reading.
Alternatively, if that's not a viable solution, how about appending newer results to the existing ones, but inserted as with an 'active' key set to 0. Then, once the process of adding entries is complete, use a transaction to find and remove currently active entries, and (in the same transaction) update the inactive entries to active.
We have about 7-8 tables in our Android application each having about 8 columns on an average. Both read and write operations are performed on the database and I am experimenting and trying to find ways to enhance the performance of the DataAccess layer. So, far I have tried the following:
Use positional arguments in where clauses (Reason: so that sqlite makes use of the same execution plan)
Enclose inserts and update with transactions(Reason: every db operation is enclosed within a transaction by default. Doing this will remove that overhead)
Indexing: I have not created any explicit index other than those created by default on the primary key and unique keys columns.(Reason: indexing will improve seek time)
I have mentioned my assumptions in paranthesis; please correct me if I am wrong.
Questions:
Can I add anything else to this list? I read somewhere that avoiding the use of db-journal can improve performance of updates? Is this a myth or fact? How can this be done, if recomended?
Are nested transactions allowed in SQLite3? How do they affect performance?
The thing is I have a function which runs an update in a loop, so, i have enclosed the loop within a transaction block. Sometimes this function is called from another loop inside some other function. The calling function also encloses the loop within a transaction block. How does such a nesting of transactions affect performance?
The where clauses on my queries use more than one columns to build the predicate. These columns might not necessarily by a primary key or unique columns. Should I create indices on these columns too? Is it a good idea to create multiple indices for such a table?
Pin down exactly which queries you need to optimize. Grab a copy of a typical database and use the REPL to time queries. Use this to benchmark any gains as you optimize.
Use ANALYZE to allow SQLite's query planner to work more efficiently.
For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up. Use EXPLAIN QUERY PLAN on your queries to see which index would be used or if the query requires a full table scan. For large tables, a full table scan is bad and you probably want an index. Only one index will be used on any given query. If you have multiple predicates, then the index that will be used is the one that is expected to reduce the result set the most (based on ANALYZE). You can have indexes that contain multiple columns (to assist queries with multiple predicates). If you have indexes with multiple columns, they are usable only if the predicates fit the index from left to right with no gaps (but unused columns at the end are fine). If you use an ordering predicate (<, <=, > etc) then that needs to be in the last used column of the index. Using both WHERE predicates and ORDER BY both require an index and SQLite can only use one, so that can be a point where performance suffers. The more indexes you have, the slower your INSERTs will be, so you will have to work out the best trade-off for your situation.
If you have more complex queries that can't make use of any indexes that you might create, you can de-normalize your schema, structuring your data in such a way that the queries are simpler and can be answered using indexes.
If you are doing a large number of INSERTs, try dropping indexes and recreating them at the end. You will need to benchmark this.
SQLite does support nested transactions using savepoints, but I'm not sure that you'll gain anything there performance-wise.
You can gain lots of speed by compromising on data integrity. If you can recover from database corruption yourself, then this might work for you. You could perhaps only do this when you're doing intensive operations that you can recover from manually.
I'm not sure how much of this you can get to from an Android application. There is a more detailed guide for optimizing SQLite in general in the SQLite documentation.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app. I'm starting with an SQLiteOpenHelper dbHelper and an SQLiteQueryBuilder qb.
String sql = qb.buildQuery(projection,selection,selectionArgs,groupBy,having,sortOrder,limit);
android.util.Log.d("EXPLAIN",sql + "; " + java.util.Arrays.toString(selectionArgs));
Cursor c = dbHelper.getReadableDatabase().rawQuery("EXPLAIN QUERY PLAN " + sql,selectionArgs);
if(c.moveToFirst()) {
do {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < c.getColumnCount(); i++) {
sb.append(c.getColumnName(i)).append(":").append(c.getString(i)).append(", ");
}
android.util.Log.d("EXPLAIN",sb.toString());
} while(c.moveToNext());
}
c.close();
I dropped this into my ContentProvider.query() and now I can see exactly how all the queries are getting performed. (In my case it looks like the problem is too many queries rather than poor use of indexing; but maybe this will help someone else...)
I would add these :
Using of rawQuery() instead of building using ContentValues will fasten up in certain cases. off course it is a little tedious to write raw query.
If you have a lot of string / text type data, consider creating Virtual tables using full text search (FTS3), which can run faster query. you can search in google for the exact speed improvements.
A minor point to add to Robie's otherwise comprehensive answer: the VFS in SQLite (which is mostly concerned with locking) can be swapped out for alternatives. You may find one of the alternatives like unix-excl or unix-none to be faster but heed the warnings on the SQLite VFS page!
Normalization (of table structures) is also worth considering (if you haven't already) simply because it tends to provide the smallest representation of the data in the database; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases (the sort I'm most familiar with), but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.