Database pruning strategy

Database pruning strategy - android

I am planning to write an Android application where I'll use its SQLite database. I was wondering what should be my limit to the number of rows I can store. Should I be having a limit?
If that limit is crossed, whats the best strategy to handle that situation provided that I need to keep them and not delete them!

Right now I can verify that my app runs with a 1.3 MB db with no problems.
If you absolutely must maintain all of the data, and you are having problems, you could utilize the SD card, but for most cases, this argument is somewhat moot.
Here is an discussion about maximum database sizes:
Link

You should be limited to as much of the information you need to store in the database. Save what you need. Avoid unnecessary rows.
Keep in mind you can overrite data in your database, for example; A user edits information.
This will allow you to reuse your same rows.
Hope this answers your question

Related

One big database versus many small databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
My App deals with several similar datasets. That is, they are stored in the same tables, but different data. The user may create more datasets. In any case, these datasets are guaranteed to be disjunct. There will never be any data in one dataset linked somehow to data in another dataset.
I was wondering, would it be better to have a dedicated database for each dataset instead of having all the data in one big database?
I would expect lookup times to improve, if the user works on a smaller database. Is there a rule of thumb, how many entries a database (or table) can hold before I should worry about lookup times?
One drawback I can think of is that opening a database creates some overhead. However, I don't expect the user to switch datasets frequently.
Consider this example:
The database contains tables for companies, clients, products and orders. Companies never share clients or products, thus companies are the disjunct datasets. However, all products, clients and orders are in just one big table (for each, respectively).
Queries to the database might include:
All orders for a particular client.
All products a particular client has ordered.
All clients who have ordered a particular product.
etc.
These queries have in common, that they will always be issued in the context of one single company. Yet since the database doesn't know about this logical partition, all clients, products and orders will be searched.
If I were to have several databases, for each company one, my logical partition would be reflected and only the relevant data would be searched. I'm not sure of the overhead of having that many databases though.
Since I'm new to database schema design, I want to throw this idea out there to see, if several databases really are a good idea or not.
Update:
In case this wasn't clear: the database will be on the Android Phone, not in the Cloud or something.

There's no rule of thumb. AFAIK the look-up time doesn't purely depend on number of entries. It depends on several factors such as but not limited to -
how fat the table is
table indexes
how the data is stored e.g. boolean true/false or string YES/NO in the table having 3 million records
hardware size
primary key/foreign key relationship (sort of connected to point 1 above)
As a general approach, one database theory is advisable. The servers nowadays are quite powerful and there are multiple options when it comes to handling the performance optimisation such as -
cloud databases which give the flexibility to choose the size
BigData
In-memory databases
Analysis services such as SSAS
NoSQL databases which are horizontally scalable e.g. FireStore
Now, the biggest benefit of using one database is - your development and testing will be quick. What does that mean ? Let's say you need to add/delete/modify one field in one table. Now, if you have 10 different databases then you will need to do the exact same change at 10 different places and then test it as well. If the changes are frequent then you might end-up in writing a generic script. And there is always a chance that this script might break e.g. database change, patch update blah blah. However, in the case of one database, the efforts are straight away 1/10th. Another benefit is database administration/monitoring will be easy e.g. adding indexes.
I had a similar requirement few months back wherein I've a similar application (mobile+web). The set-up is similar. Different companies access the data. And the user from a particular company is allowed to view data pertaining to his/her company. All I've done is to add one more column assigned as ORGCODE in almost every table. More than 12 clients are happily sharing the tables without any issues.
Disclaimer: All of the above is quite generic without knowing your use-case and performance requirement.

Your question reminds me of some articles out there discussing the difference between relational databases and storing data as json or other noSQL options. Without doing some studies on what you are trying to accomplish and the scale that you might get to it is hard to judge. However, from a maintenance perspective, your database schema and its flexibility to change would favor the single db instance. You might go with multiple tables as well.

Well this is question of pure performance. You should know how big should be your database and how much bigger will it be with all the data you ought to store in separate database - if this amount is around 20% of the general database and it will be only decreasing - use one database, if it may increase to allocate 50% or more of the general database - you may consider separate ones.
General size of database also matters. Modern devices may relatively comfortably work with databases up to 500mb(~500 000 heavy lines). It will handle more but it will require some modifications of UX and UI and scheme in order to minimize calls(pagination, indexes etc.). Although if you will run such an application of some weak device it will crush.
Also knowing how SQLite works(virtual tables in RAM) it is highly dependent on RAM amount accessible by an app. It is best to use db sized up to 100mb.
As you can see there is no single approach - you have to choose based on your app usecases and predicted size of database.
Hope this answer helps you somehow.

I would go for one database - Less maintenance and stuff that can go wrong.
Make sure its optimized and indexed

How to Manage increasing size of sqlite in Mobile App?

My app are sometime needed syncing with web servers and pull the data in mobile sqlite database for offline usages, so database size is keep growing exponentially.
I want to know how the professional app like whatsapp,hike,evernote etc manage their offline sqlite database.
Please suggest me the steps to solve this problem.
PS: I am asking about offline database (i.e growing in the size after syncing) management do not confuse with database syncing with web servers.

I do not know how large is your data size is. However, I think it should not be a problem storing reasonably large data into the internal memory of an application. The internal memory is shared among all applications and hence it can grow until the storage getting filled.
In my opinion, the main problem here is the query time if you do not have the proper indexing to your database tables. Otherwise, keeping the databases in your internal storage is completely fine and I think you do not have to be worried about the amount of data which can be stored in the internal storage of an application as the newer Android devices provide better storage capability.
Hence, if your database is really big, which does not fit into the internal memory, you might consider having the data only which is being used frequently and delete otherwise. This highly depends on the use case of your application.
In one of the applications that I developed, I stored some large databases in the external memory and copied them into the internal memory whenever it was necessary. Copying the database from external storage into internal storage took some time (few seconds) though. However, once the database got copied I could run queries efficiently.
Let me know if you need any help or clarification for some points. I hope that helps you.

For max size databases. AFAIK You don't want to loose what's on the device and force a reload.
Ensure you don't drop the database with each new release of your app when a simple alter table add column will work.
What you do archive and remove from the device give the user a way to load it in the background.

There might be some Apps / databases where you can find a documentation, but probably this case is limited and an exception.
So to know exactly what's going on you need to create some snapshots of the databases. You can start with that of one app only, or do it directly with several, but without analyzing you won't get a reliable statement.
The reasons might be even different for each app as databases and app-features differ naturally too.
Faster growth in size than amount of incoming content might be related to cache-tables or indexing for searches, but perhaps there exist other reasons too. Without verification and some important basic-info about it, it's impossible to tell you a detailed reason.
It's possible that table-names of a database give already some hints, but if tablenames or even fields just use meaningless strings, then you've to analyze the data inside including the changes between snapshots.

The following link will help in understanding what exactly Whatsapp is using,
https://www.quora.com/How-is-the-Whatsapp-database-structured
Not really sure if you have to keep all the data all the time stored on the device, but if you have a choice you can always use cloud services (like FCM, AWS) to store or backup most of the data. If you need to keep all the data on the device, then perhaps one way is to use Caching mechanisms in your app.
For Example - Using LRU (Least Recently Used) to cache/store the data that you need on the device, while storing the rest on the cloud, and deleting whats unneeded from the device. If needed you can always retrieve the data on demand (i.e. if the user tries to pull to refresh or on a different action) and delete it whenever its not being used.

Optimizing fast access to a readonly sqlite database?

I have a huge database and I want my application to work with it as soon as possible. I'm using android so resources are more restricted. I know that its not a good idea to storage huge data in the sqlite database, but I need this.
Each database contain only ONE table and I use it READ only.
What advice can you give me to optimize databases as much as possible. I've already read this post, and except the PRAGMA commands what else can I use?
Maybe there are some special types of the tables which are restricted for read only queries, but principally faster then ordinary table types?

As long as your database fits on the device, there is no problem with that; you'll just have less space for other apps.
There is no special table type. However, if you have queries that use only a subset of a table's columns, and if you have enough space left, consider adding one or more covering indexes.
Being read-only allows the database to be optimized on the desktop, before you deploy it:
set page size, etc.;
create useful indexes;
ANALYZE
VACUUM
In your app, you might experiment with increasing the page cache size, but if your working set is larger than free memory, that won't help anyway. In any case, random reads from flash are fast, so that would not be much of a problem.

Huge is relative. But ultimately a device is constrained on storage and memory. So assuming that huge is beyond the typical constraints of a device, you have a few options.
The first option is to store your huge dataset in the cloud and the connected device can offer views into that data by offering cloud services with something like RESTful APIs from the coud to proffer the data to the device. If the device and app rely on always being connected, you don't need as much local storage unless you want to cache data.
Another approach is an occasionally connected device (sometimes offline) where you pull down a slice of the most relevant data to work on to the device. In that model, yo can work offline and push/pull back to the cloud. In this model, sqlite is the storage mechanism to hold that slice of relevant data.
EDIT based on comments:
Concerning optimizing what you have on the device, see the optimization FAQ here:
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
(in rough order of effectiveness)
Use an in-memory database
Use BEGIN TRANSACTION and END TRANSACTION
Use indexes Use PRAGMA cache_size
Use PRAGMA synchronous=OFF
Compact the database
Replace the memory allocation library
Use PRAGMA count_changes=OFF

Maybe I'm stating the obvious but you should probably just open it with the SQLITE_OPEN_READONLY flag to sqlite3_open: I think that SQLite will take advantage of this fact and optimize the behaviour of the engine.
Note that all normal SQL(ite) optimization tips still apply (e.g. VACUUMing to finalize the database, setting the correct page size at database creation, proper indexes and so on...)
In addition, if you have multiple threads accessing the database in your application, you may want to try out also the SQLITE_OPEN_NOMUTEX and SQLITE_OPEN_SHAREDCACHE flags (they require sqlite3_open_v2, though)

Also you need journalling switch off, because data not change http://www.sqlite.org/pragma.html#pragma_journal_mode
PRAGMA journal_mode=OFF

Which is more memory efficient, SQLite database or XML string[]?

I'm new but learning. I just need to know, which is more memory efficient, string[] in xml or an SQLite db? I can do either, and can do pre-populated on the db. I'm talking about at most 1000 strings, with more possible in updates.
Thanks for your answers.
PS I have learned so much from Stackoverflow. this is the first place I turn to when I hit a snag. Thank you.

Depends on the strings are and what you need them for. If they vary each time the app runs, leaving them in memory, as a string array, is probably best. If they are persistent between app runs, the sqlite DB will probably be better in the long run since you don't need to "reload" the database between app runs.
Likewise, do you really need all 1000 strings in memory at all times while the app runs? If so, again the array might be a good idea. If not, the database is a better bet.
Ultimately, you need to run it on a variety of android devices and see which implementation is sufficiently responsive for whatever the app is designed to do.

I would say string[] is much better. Here is a good answer from SO itself.
"Unless you want to store the data persistently I'd say you should probably just use an Array. Databases are more for persistent storage (i.e. stuff you'll need over multiple runs of your app). That said, if you arrays start getting reeeeeeeeeealy* big, then yea you're going to want to move them onto disk (in which case they won't take up any memory). And probably the simplest way to do that is with a database.
*On the order of magnitude of hundreds of thousands of entrys, maybe even more."
Source: #Kurtis Nusbaum
https://stackoverflow.com/a/7906472/847954

Best practice for keeping data in memory and database at same time on Android

We're designing an Android app that has a lot of data ("customers", "products", "orders"...), and we don't want to query SQLite every time we need some record. We want to avoid to query the database as most as we can, so we decided to keep certain data always in memory.
Our initial idea is to create two simple classes:
"MemoryRecord": a class that will contain basically an array of objects (string, int, double, datetime, etc...), that are the data from a table record, and all methods to get those data in/out from this array.
"MemoryTable": a class that will contain basically a Map of [Key,MemoryRecord] and all methods to manipulate this Map and insert/update/delete record into/from database.
Those classes will be derived to every kind of table we have in the database. Of course there are other useful methods not listed above, but they are not important at this point.
So, when starting the app, we will load those tables from an SQLite database to memory using those classes, and every time we need to change some data, we will change in memory and post it into the database right after.
But, we want some help/advice from you. Can you suggest something more simple or efficient to implement such a thing? Or maybe some existing classes that already do it for us?
I understand what you guys are trying to show me, and I thank you for that.
But, let's say we have a table with 2000 records, and I will need to list those records. For each one, I have to query other 30 tables (some of them with 1000 records, others with 10 records) to add additional information in the list, and this while it's "flying" (and as you know, we must be very fast at this moment).
Now you'll be going to say: "just build your main query with all those 'joins', and bring all you need in one step. SQLite can be very fast, if your database is well designed, etc...".
OK, but this query will become very complicated and sure, even though SQLite is very fast, it will be "too" slow (2 a 4 seconds, as I confirmed, and this isn't an acceptable time for us).
Another complicator is that, depending on user interaction, we need to "re-query" all records, because the tables involved are not the same, and we have to "re-join" with another set of tables.
So, an alternative is bring only the main records (this will never change, no matter what user does or wants) with no join (this is very fast!) and query the other tables every time we want some data. Note that on the table with 10 records only, we will fetch the same records many and many times. In this case, it is a waste of time, because no matter fast SQLite is, it will always be more expensive to query, cursor, fetch, etc... than just grabbing the record from a kind of "memory cache". I want to make clear that we don't plan to keep all data in memory always, just some tables we query very often.
And we came to the original question: What is the best way to "cache" those records? I really like to focus the discussion on that and not "why do you need to cache data?"

The vast majority of the apps on the platform (contacts, Email, Gmail, calendar, etc.) do not do this. Some of these have extremely complicated database schemas with potentially a large amount of data and do not need to do this. What you are proposing to do is going to cause huge pain for you, with no clear gain.
You should first focus on designing your database and schema to be able to do efficient queries. There are two main reasons I can think of for database access to be slow:
You have really complicated data schemas.
You have a very large amount of data.
If you are going to have a lot of data, you can't afford to keep it all in memory anyway, so this is a dead end. If you have complicated structures, you would benefit in either case with optimizing them to improve performance. In both cases, your database schema is going to be key to good performance.
Actually optimizing the schema can be a bit a of a black art (and I am no expert on it), but some things to look out for are correctly creating indices on rows you will query, designing joins so they will take efficient paths, etc. I am sure there are lots of people who can help you with this area.
You could also try looking at the source of some of the platform's databases to get some ideas of how to design for good performance. For example the Contacts database (especially starting with 2.0) is extremely complicated and has a lot of optimizations to provide good performance on relatively large data and extensible data sets with lots of different kinds of queries.
Update:
Here's a good illustration of how important database optimization is. In Android's media provider database, a newer version of the platform changed the schema significantly to add some new features. The upgrade code to modify an existing media database to the new schema could take 8 minutes or more to execute.
An engineer made an optimization that reduced the upgrade time of a real test database from 8 minutes to 8 seconds. A 60x performance improvement.
What was this optimization?
It was to create a temporary index, at the point of upgrade, on an important column used in the upgrade operations. (And then delete it when done.) So this 60x performance improvement comes even though it also includes the time needed to build an index on one of the columns used during upgrading.
SQLite is one of those things where if you know what you are doing it can be remarkably efficient. And if you don't take care in how you use it, you can end up with wretched performance. It is a safe bet, though, if you are having performance issues with it that you can fix them by improving how you are using SQLite.

The problem with a memory cache is of course that you need to keep it in sync with the database. I've found that querying the database is actually quite fast, and you may be pre-optimizing here. I've done a lot of tests on queries with different data sets and they never take more than 10-20 ms.
It all depends on how you're using the data, of course. ListViews are quite well optimized to handle large numbers of rows (I've tested into the 5000 range with no real issues).
If you are going to stay with the memory cache, you may want have the database notify the cache when it's contents change and then you can update the cache. That way anyone can update the database without knowing about the caching. Also, if you build a ContentProvider over your database, you can use the ContentResolver to notify you of changes if you register using registerContentObserver.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.