I am a working on a project in which I retrieve data from facebook about friends of the user. Friends details vary some times while at the other times they are the same as the one stored in the db.
I can use the replace command to make sure that the db is consistent with whatever information I retrieve from the facebook.
My question is how efficient this technique will be? In other words, I can use two techniques:
One is to use the replace command and replace the complete record blindly
Second is to first check whether there is any difference from the record saved in the db and update only the fields that have changed
Which of these approaches is going to be more efficient?
I've found that queuing up a number of sqlite commands in a row is much more efficient than is doing anything else in between, even just comparing a few values.
I'd strongly recommend that you just do an update command. SQLite is fast.
My observation is that SQLite is always way faster than I am. So let it do the heavy lifting and just dump the data at it, and let it sort out your updates.
For example, I was searching through about 7,000 records. I pulled the records out into an array, did a quick check for one field, and separated it into two arrays. This was taking me about 5 seconds. I replaced it with two separate SQLite queries that each had to go through the entire data base. The revised dual query takes about a quarter second, near as I can tell, because its so crazy fast.
I've had similar speed luck with Updates in my big database.
Related
I have an android app with a Sqlite database (it's about 800Mb), sometimes I need to insert, modify or delete database rows from an external server (via internet) in order to update the database.
Is there a way to update the database from the server without having to download the entire database (800mb)?
I was thinking of a homemade solution that consists of adding a new column to the server database that indicates if said row needs to be inserted, deleted or modified by the android app, but I don't know if something is already implemented.
First question- does the database also change locally on the Android device? If so, you're basically into cache coherency. There's an old joke that the two hardest problems in CS are cache coherency and naming things. It's not totally wrong.
If you do need to keep local changes, especially if you need to sync local changes up, this needs to be a small book, so I'm going to assume not for the rest of the answer.
Honestly, if your db needs to scale at all or you need to make changes frequently, downloading a new db is the way to go. Doing any sort of diff against the db is going to cost you a lot of DB processor time, which translates to bigger or faster db servers, which equals money. Or a big perf hit on any other use of the db.
If you do decide you need to do this you need two extra columns. One- an isDeleted flag. That way you can easily check for deleted rows (the only other way to do so is to download all rows and see what's missing, which is a very bad idea). Please note you'll need to change every db query you make anywhere to add "and isDeleted=false" as a condition so you don't return delete rows.
The second column isn't an "isModified" field, its a "modifiedTime" timestamp. Why a timestamp? Because you can't be assured that a client downloading the db was only 1 version behind. He could be 2. Or 10. You need to be able to get all the changes in all the previous versions as well, so an isModified isn't good enough. With a modifiedTime field, you can find the max modifiedTime in the local db, then ask the server for all rows with a modifiedTime greater than yours. You'll then either need to change all your inserts and updates to also set modifiedTime, or use a trigger to do so.
There are a few other ways to do it- a migration file approach (a file with the SQL commands to alter the data) can work if your changes are small. Really though, just download the db. It's so much simpler and less likely to break things. And if you're doing large updates, it may even be less bandwidth. Most importantly, if you just download the file you know the data is correct- if you try and do some kind of diff like above, you have to worry about bugs or inconsistencies in the data for various reasons (did your app get killed while processing the changes? Do you have a bug? Did you do a query mid change and get broken data, with only half the changes you need? Downloading a new file and swapping the dbs when done fixes all those things).
The problem
I need to do a T9 contact search for an Android project I'm working on. Now, it would be simple if I just had to pull contacts from the native contacts storage and then do T9 on that, but the problem I have is that I have an additional local database where we store extra content for some contacts in the form of additional numbers that our application displays and handles. I need to do a search based on the contact’s name, number, and the extra numbers (if any) contained in the local database. The local database has IDs that match those of the contacts in the native Android database.
I have been looking for a solution to this problem, and I have gone through these ideas, but none of them seem to be the right solution.
Try #1
Write a ContentProvider for our local database, in order to be able to perform a simple join operation between the native Android contacts table and our table, however, it seems that joining tables via ContentProviders is only possible when you write your own ContentProvider, thus making this solution not viable for me, not to mention that Android documentation states that you should not write a ContentProvider if you don’t want to share your data with other applications, which we currently don’t.
Try #2
Copy all the needed data from Android’s contacts database into our database, and use ContentObservers to update it constantly. This solution had two major problems: 1) It seemed to have a big overhead, not just on the processor of the device, but also on the development, as we would have to introduce some really delicate update/read/write mechanisms and ensure that our data always stays relatively fresh, while also being performant; 2) A colleague has stated that during contact sync, the ContentObserver fires off events very often, thus making a need for special code to delay the updating, which he says has never really worked out great.
Try #3
Use a CursorJoiner to join the two cursors that I have received and then use a MatrixCursor to display all the data, but that solution is not viable since all the data would be kept in memory, and we are working with datasets that have more than 10k rows of data. Even if the memory could handle it, it would be slow to load, which for T9, isn’t really an option. This also pretty much excludes any solution that doesn't use a Cursor to look over data, which is why I am going in that direction.
Question
Am I missing something obvious? If I am, please point me in the right way. All of the things that I have tried don’t seem feasible to me, but I’m open to someone modifying them in order to make them worthwhile.
I am currently working on an Android app where users can order their list via drag and drop in any order they want. Therefore, I must store the sort order in a variable and a column. I was thinking of giving each row a number like 100000, 200000, 300000, etc and if a user moves an item between 100000 and 200000, then its sort number becomes the average of its neighbours ie 150000. So the farther apart the numbers, the fewer times I have to "reset" the sort numbers when they converge onto each other
There are a few things I am worried about:
First is what is the most efficient? Do large numbers use more resources or take longer to sort? I only expect ~40 rows so if large numbers take longer to sort, I might be better off using smaller numbers and "resetting" more often.
Second is ensuring cross-platform compatibility in the future. For now, I only have to worry about this working with my Android app which uses Java and SQLite which haves longs with max of 2 ^ 63-1. But in the future I may have to worry about things like syncing between an iOS app which uses Objective C and maybe a web client. I am not familiar with those technologies so it would be helpful if anyone could point out any cross-compatibility problems I might have with them and how I can prevent them now so I won't have to modify my stuff later.
Thanks
Edit: Sorry I forgot to mention this but another reason I don't want to just update all the rows every time with an incrementing sort order is because not only do I have to save to the local db but I have to sync any updated changes to a BAAS. A lot of BAAS like Parse and Kinvey (what im using) do not allow you to batch save objects so every time I "reset" all the sort orders I have to make a request to my BAAS for every row instead of just one.
You can just stick with the SortOrder being separated by one. I think you are worried about having to perform an update on tons of rows, but if you Index on SortOrder this won't be a problem. They will all get updated extremely fast.
I think you'll be surprised how fast a query like
UPDATE TableName SET SortOrder = SortOrder + 1 WHERE SortOrder > {InsertLocation + 1};
will run with an Index on your SortOrder column. If you have a lot of data and don't have the Index, you'll notice the query will take significantly longer. Try it out for yourself!
Regarding cross platform SQLite support, both iOS and the Web support SQLite, so you shouldn't have a problem there.
My Android app works by using a SQLite database that is generated on the user's PC and transferred to the device. It all works, but I had not anticipated the number of users who would have really huge amounts of data. In these cases, the UI is very sluggish as it waits for the data to be fetched.
I've tried a number of tricks that I was "sure" would speed things up, but nothing seems to have any noticeable effect. My queries are almost all very simple, being usually a single "col=val" for the WHERE clause, and INTEGER data in the column. So I can't do much with the queries.
The latest, and I am not an SQL expert by any means, was to use "CREATE INDEX" commands on the PC, believing that these indexes are used to speed up database searches. The indexes increased the size of the database file significantly, so I was then surprised that it seemed to have no effect whatsoever on the speed of my app! A screen that was taking 8 seconds to fill without indexes still takes about 8 seconds even with them. I was hoping to get things down to at least half that.
What I am wondering at this point is if the SQLite implementation on Android uses database indexes at all, or if I'm just wasting space by generating them. Can anyone answer this?
Also, any other things to try to speed up access?
(For what it's worth, on an absolute basis the users have nothing to complain about. My worst-case user so far has data that generates 630,000 records (15 tables), so there's only so much that's possible!)
Doug Gordon
GHCS Systems
SQLite will use the index if it is appropriate for the query. Use EXPLAIN
EXPLAIN QUERY PLAN ... your select statement ...
to see what indexes SQLite is using. The query plan is based on some assumptions about your database content. You may be able to improve the plan by using ANALYZE
I was finally able to achieve tremendous performance gains simply by querying the database in a much more efficient way. For example, in building up an array of information, I was previously querying the database for each row that I required with a "WHERE _id = n" type selector. But in doing it this way, I was issuing a dozen or more queries, one at a time.
Instead, I now build up a list of IDs that are required, then get them all with a single query of the form "WHERE _id IN (n1, n2, n3, ...)" and iterate through the returned cursor. Doing this and some other structure optimizations, the largest database is now almost as quick to view as the more average case.
Every time you're going to perform some kind of action (being database lookup, long-running calculation, web request etc.) taking more than a couple of hundreds of milliseconds, you should consider wrapping this inside an AsyncTask.
Painless Threading is a good article on this topic, so I recommend you take a close look at it.
This article discusses the threading
model used by Android applications and
how applications can ensure best UI
performance by spawning worker threads
to handle long-running operations,
rather than handling them in the main
thread.
We're designing an Android app that has a lot of data ("customers", "products", "orders"...), and we don't want to query SQLite every time we need some record. We want to avoid to query the database as most as we can, so we decided to keep certain data always in memory.
Our initial idea is to create two simple classes:
"MemoryRecord": a class that will contain basically an array of objects (string, int, double, datetime, etc...), that are the data from a table record, and all methods to get those data in/out from this array.
"MemoryTable": a class that will contain basically a Map of [Key,MemoryRecord] and all methods to manipulate this Map and insert/update/delete record into/from database.
Those classes will be derived to every kind of table we have in the database. Of course there are other useful methods not listed above, but they are not important at this point.
So, when starting the app, we will load those tables from an SQLite database to memory using those classes, and every time we need to change some data, we will change in memory and post it into the database right after.
But, we want some help/advice from you. Can you suggest something more simple or efficient to implement such a thing? Or maybe some existing classes that already do it for us?
I understand what you guys are trying to show me, and I thank you for that.
But, let's say we have a table with 2000 records, and I will need to list those records. For each one, I have to query other 30 tables (some of them with 1000 records, others with 10 records) to add additional information in the list, and this while it's "flying" (and as you know, we must be very fast at this moment).
Now you'll be going to say: "just build your main query with all those 'joins', and bring all you need in one step. SQLite can be very fast, if your database is well designed, etc...".
OK, but this query will become very complicated and sure, even though SQLite is very fast, it will be "too" slow (2 a 4 seconds, as I confirmed, and this isn't an acceptable time for us).
Another complicator is that, depending on user interaction, we need to "re-query" all records, because the tables involved are not the same, and we have to "re-join" with another set of tables.
So, an alternative is bring only the main records (this will never change, no matter what user does or wants) with no join (this is very fast!) and query the other tables every time we want some data. Note that on the table with 10 records only, we will fetch the same records many and many times. In this case, it is a waste of time, because no matter fast SQLite is, it will always be more expensive to query, cursor, fetch, etc... than just grabbing the record from a kind of "memory cache". I want to make clear that we don't plan to keep all data in memory always, just some tables we query very often.
And we came to the original question: What is the best way to "cache" those records? I really like to focus the discussion on that and not "why do you need to cache data?"
The vast majority of the apps on the platform (contacts, Email, Gmail, calendar, etc.) do not do this. Some of these have extremely complicated database schemas with potentially a large amount of data and do not need to do this. What you are proposing to do is going to cause huge pain for you, with no clear gain.
You should first focus on designing your database and schema to be able to do efficient queries. There are two main reasons I can think of for database access to be slow:
You have really complicated data schemas.
You have a very large amount of data.
If you are going to have a lot of data, you can't afford to keep it all in memory anyway, so this is a dead end. If you have complicated structures, you would benefit in either case with optimizing them to improve performance. In both cases, your database schema is going to be key to good performance.
Actually optimizing the schema can be a bit a of a black art (and I am no expert on it), but some things to look out for are correctly creating indices on rows you will query, designing joins so they will take efficient paths, etc. I am sure there are lots of people who can help you with this area.
You could also try looking at the source of some of the platform's databases to get some ideas of how to design for good performance. For example the Contacts database (especially starting with 2.0) is extremely complicated and has a lot of optimizations to provide good performance on relatively large data and extensible data sets with lots of different kinds of queries.
Update:
Here's a good illustration of how important database optimization is. In Android's media provider database, a newer version of the platform changed the schema significantly to add some new features. The upgrade code to modify an existing media database to the new schema could take 8 minutes or more to execute.
An engineer made an optimization that reduced the upgrade time of a real test database from 8 minutes to 8 seconds. A 60x performance improvement.
What was this optimization?
It was to create a temporary index, at the point of upgrade, on an important column used in the upgrade operations. (And then delete it when done.) So this 60x performance improvement comes even though it also includes the time needed to build an index on one of the columns used during upgrading.
SQLite is one of those things where if you know what you are doing it can be remarkably efficient. And if you don't take care in how you use it, you can end up with wretched performance. It is a safe bet, though, if you are having performance issues with it that you can fix them by improving how you are using SQLite.
The problem with a memory cache is of course that you need to keep it in sync with the database. I've found that querying the database is actually quite fast, and you may be pre-optimizing here. I've done a lot of tests on queries with different data sets and they never take more than 10-20 ms.
It all depends on how you're using the data, of course. ListViews are quite well optimized to handle large numbers of rows (I've tested into the 5000 range with no real issues).
If you are going to stay with the memory cache, you may want have the database notify the cache when it's contents change and then you can update the cache. That way anyone can update the database without knowing about the caching. Also, if you build a ContentProvider over your database, you can use the ContentResolver to notify you of changes if you register using registerContentObserver.