One big database versus many small databases [closed] - android

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
My App deals with several similar datasets. That is, they are stored in the same tables, but different data. The user may create more datasets. In any case, these datasets are guaranteed to be disjunct. There will never be any data in one dataset linked somehow to data in another dataset.
I was wondering, would it be better to have a dedicated database for each dataset instead of having all the data in one big database?
I would expect lookup times to improve, if the user works on a smaller database. Is there a rule of thumb, how many entries a database (or table) can hold before I should worry about lookup times?
One drawback I can think of is that opening a database creates some overhead. However, I don't expect the user to switch datasets frequently.
Consider this example:
The database contains tables for companies, clients, products and orders. Companies never share clients or products, thus companies are the disjunct datasets. However, all products, clients and orders are in just one big table (for each, respectively).
Queries to the database might include:
All orders for a particular client.
All products a particular client has ordered.
All clients who have ordered a particular product.
etc.
These queries have in common, that they will always be issued in the context of one single company. Yet since the database doesn't know about this logical partition, all clients, products and orders will be searched.
If I were to have several databases, for each company one, my logical partition would be reflected and only the relevant data would be searched. I'm not sure of the overhead of having that many databases though.
Since I'm new to database schema design, I want to throw this idea out there to see, if several databases really are a good idea or not.
Update:
In case this wasn't clear: the database will be on the Android Phone, not in the Cloud or something.

There's no rule of thumb. AFAIK the look-up time doesn't purely depend on number of entries. It depends on several factors such as but not limited to -
how fat the table is
table indexes
how the data is stored e.g. boolean true/false or string YES/NO in the table having 3 million records
hardware size
primary key/foreign key relationship (sort of connected to point 1 above)
As a general approach, one database theory is advisable. The servers nowadays are quite powerful and there are multiple options when it comes to handling the performance optimisation such as -
cloud databases which give the flexibility to choose the size
BigData
In-memory databases
Analysis services such as SSAS
NoSQL databases which are horizontally scalable e.g. FireStore
Now, the biggest benefit of using one database is - your development and testing will be quick. What does that mean ? Let's say you need to add/delete/modify one field in one table. Now, if you have 10 different databases then you will need to do the exact same change at 10 different places and then test it as well. If the changes are frequent then you might end-up in writing a generic script. And there is always a chance that this script might break e.g. database change, patch update blah blah. However, in the case of one database, the efforts are straight away 1/10th. Another benefit is database administration/monitoring will be easy e.g. adding indexes.
I had a similar requirement few months back wherein I've a similar application (mobile+web). The set-up is similar. Different companies access the data. And the user from a particular company is allowed to view data pertaining to his/her company. All I've done is to add one more column assigned as ORGCODE in almost every table. More than 12 clients are happily sharing the tables without any issues.
Disclaimer: All of the above is quite generic without knowing your use-case and performance requirement.

Your question reminds me of some articles out there discussing the difference between relational databases and storing data as json or other noSQL options. Without doing some studies on what you are trying to accomplish and the scale that you might get to it is hard to judge. However, from a maintenance perspective, your database schema and its flexibility to change would favor the single db instance. You might go with multiple tables as well.

Well this is question of pure performance. You should know how big should be your database and how much bigger will it be with all the data you ought to store in separate database - if this amount is around 20% of the general database and it will be only decreasing - use one database, if it may increase to allocate 50% or more of the general database - you may consider separate ones.
General size of database also matters. Modern devices may relatively comfortably work with databases up to 500mb(~500 000 heavy lines). It will handle more but it will require some modifications of UX and UI and scheme in order to minimize calls(pagination, indexes etc.). Although if you will run such an application of some weak device it will crush.
Also knowing how SQLite works(virtual tables in RAM) it is highly dependent on RAM amount accessible by an app. It is best to use db sized up to 100mb.
As you can see there is no single approach - you have to choose based on your app usecases and predicted size of database.
Hope this answer helps you somehow.

I would go for one database - Less maintenance and stuff that can go wrong.
Make sure its optimized and indexed

Related

The most efficient way to implement a database using custom data + google fitness api

I am currently learning android programming and creating an app that will store some integers representing user choices (values inserted several times a day, must be displayed in the results activity) and steps data collected Google Fit HISTORY Android APIs, also displayed in the results activity. I am looking for the most efficient way to store this data. I know that it might be possible to insert the custom data types in the GOOGLE fit database. However, I am not sure if it is a good idea if the app mostly works offline, and it needs to immediately represent only a small set of results, for example, the values inserted in the last 2 weeks, with step counts. On the other hand, I am not sure if it is ok to have two databases storing the data.
My apologies if the question sounds a bit too amateur, I am doing my best to find an optimal solution in terms of performance.
Thank you for your answers.
So, to give you my opinion and answer (mainly opinion)
Android has 3 ways (mainly) for storing data:
Files
Online database/API
Local database
for this specific scenario you have listed, wanting the data to be available offline, you should probably be looking at using Room: https://developer.android.com/training/data-storage/room, as it supports storing primitive types without having to write any type converters, you can store models and custom data as well, it uses very basic SQL (because it's a wrapper for the older Sqlite database methods) and is part of android (not an external 3rd party library). Room also requires most operations to be done off of threads, instead of main threads and this will improve your performance as well (also has support for livedata/rxjava to observe straight onto any changes as they happen)
However, as I told this user here:
Should i store one arrayList per file or should i store all my arrayList in the same file?
When starting out, don't worry about the best way for doing something, instead, try something out and learn from it, worrying about the best solution now is rather pointless, either way, happy learning and coding :P

What´s better? Several smaller databases or one large

I am doing application for learning words in foreign language, so I have this words stored in my database. These words are separated for example into 3 levels of difficulty. Every level is made of some groups of words, these groups introduces TABLES of SQLite db. I am using SQLiteOpenHelper as communication between application and databases.
Now my question. What is better?
Make 3 smaller databases, each for every level and use own
SQLiteOpenHelper, so together 3 dbs with 3 open helpers.
Make 1 large database, where will be that 3 levels, which means
many TABLES, but just only 1 SQLiteOpenHelper.
Thanks for any advice or opininon.
I suggest 1 large database (DB).
You should not be worried about making large DBs, DBs are invented to store a large amount of data (and even many-many tables). It is much easier to create and maintain one DB than multiple ones and your code will be much clearer using one DB.
And I don't know your program, but I would go even further: I would rather store all words in the same table if you store the same information of them, and add a column to show the level and another one to show the group which they belong to.
The main idea of SQL is that you don't really care how much space your DB will require and how much time it gonna take to find the result of a query because DataBase Managent Systems (in your case the SQLiteOpenHelper and SQLite) are insanely efficient considering space and time. Instead you should rather concentrate on designing a system that can be expanded easily (for example if you want to add another column to tables containing words (e.g. you want to store a new information about words) or want to add new levels or groups in a later stage of development) and has clear structure. You might lose a few milliseconds separating groups and levels via the SELECT command of SQL, but your DB will be much more flexible - you can add levels and groups and add more information about words with ease. The key of desinging a good DB: You should store different kind of data in different tables and same kind of data in same table...
The error that you mention in your comment is almost certainly a bug in your application code. There is no reason that an application with multiple databases should encounter that sort of error.
That said, my answer to your original question is that it is objectively "better" to use a single database.
It is better because you will have less code to maintain, no possibility of attempting to access the wrong database in a given situation, and the code will be more idiomatic - i.e. there's no benefit to using multiple databases, so if you were to use multiple databases, anyone reading your code would spend a lot of time trying to figure out why you did it.

Database pruning strategy

I am planning to write an Android application where I'll use its SQLite database. I was wondering what should be my limit to the number of rows I can store. Should I be having a limit?
If that limit is crossed, whats the best strategy to handle that situation provided that I need to keep them and not delete them!
Right now I can verify that my app runs with a 1.3 MB db with no problems.
If you absolutely must maintain all of the data, and you are having problems, you could utilize the SD card, but for most cases, this argument is somewhat moot.
Here is an discussion about maximum database sizes:
Link
You should be limited to as much of the information you need to store in the database. Save what you need. Avoid unnecessary rows.
Keep in mind you can overrite data in your database, for example; A user edits information.
This will allow you to reuse your same rows.
Hope this answers your question

Android SQLite Performance with Indexes

My Android app works by using a SQLite database that is generated on the user's PC and transferred to the device. It all works, but I had not anticipated the number of users who would have really huge amounts of data. In these cases, the UI is very sluggish as it waits for the data to be fetched.
I've tried a number of tricks that I was "sure" would speed things up, but nothing seems to have any noticeable effect. My queries are almost all very simple, being usually a single "col=val" for the WHERE clause, and INTEGER data in the column. So I can't do much with the queries.
The latest, and I am not an SQL expert by any means, was to use "CREATE INDEX" commands on the PC, believing that these indexes are used to speed up database searches. The indexes increased the size of the database file significantly, so I was then surprised that it seemed to have no effect whatsoever on the speed of my app! A screen that was taking 8 seconds to fill without indexes still takes about 8 seconds even with them. I was hoping to get things down to at least half that.
What I am wondering at this point is if the SQLite implementation on Android uses database indexes at all, or if I'm just wasting space by generating them. Can anyone answer this?
Also, any other things to try to speed up access?
(For what it's worth, on an absolute basis the users have nothing to complain about. My worst-case user so far has data that generates 630,000 records (15 tables), so there's only so much that's possible!)
Doug Gordon
GHCS Systems
SQLite will use the index if it is appropriate for the query. Use EXPLAIN
EXPLAIN QUERY PLAN ... your select statement ...
to see what indexes SQLite is using. The query plan is based on some assumptions about your database content. You may be able to improve the plan by using ANALYZE
I was finally able to achieve tremendous performance gains simply by querying the database in a much more efficient way. For example, in building up an array of information, I was previously querying the database for each row that I required with a "WHERE _id = n" type selector. But in doing it this way, I was issuing a dozen or more queries, one at a time.
Instead, I now build up a list of IDs that are required, then get them all with a single query of the form "WHERE _id IN (n1, n2, n3, ...)" and iterate through the returned cursor. Doing this and some other structure optimizations, the largest database is now almost as quick to view as the more average case.
Every time you're going to perform some kind of action (being database lookup, long-running calculation, web request etc.) taking more than a couple of hundreds of milliseconds, you should consider wrapping this inside an AsyncTask.
Painless Threading is a good article on this topic, so I recommend you take a close look at it.
This article discusses the threading
model used by Android applications and
how applications can ensure best UI
performance by spawning worker threads
to handle long-running operations,
rather than handling them in the main
thread.

Best practice for keeping data in memory and database at same time on Android

We're designing an Android app that has a lot of data ("customers", "products", "orders"...), and we don't want to query SQLite every time we need some record. We want to avoid to query the database as most as we can, so we decided to keep certain data always in memory.
Our initial idea is to create two simple classes:
"MemoryRecord": a class that will contain basically an array of objects (string, int, double, datetime, etc...), that are the data from a table record, and all methods to get those data in/out from this array.
"MemoryTable": a class that will contain basically a Map of [Key,MemoryRecord] and all methods to manipulate this Map and insert/update/delete record into/from database.
Those classes will be derived to every kind of table we have in the database. Of course there are other useful methods not listed above, but they are not important at this point.
So, when starting the app, we will load those tables from an SQLite database to memory using those classes, and every time we need to change some data, we will change in memory and post it into the database right after.
But, we want some help/advice from you. Can you suggest something more simple or efficient to implement such a thing? Or maybe some existing classes that already do it for us?
I understand what you guys are trying to show me, and I thank you for that.
But, let's say we have a table with 2000 records, and I will need to list those records. For each one, I have to query other 30 tables (some of them with 1000 records, others with 10 records) to add additional information in the list, and this while it's "flying" (and as you know, we must be very fast at this moment).
Now you'll be going to say: "just build your main query with all those 'joins', and bring all you need in one step. SQLite can be very fast, if your database is well designed, etc...".
OK, but this query will become very complicated and sure, even though SQLite is very fast, it will be "too" slow (2 a 4 seconds, as I confirmed, and this isn't an acceptable time for us).
Another complicator is that, depending on user interaction, we need to "re-query" all records, because the tables involved are not the same, and we have to "re-join" with another set of tables.
So, an alternative is bring only the main records (this will never change, no matter what user does or wants) with no join (this is very fast!) and query the other tables every time we want some data. Note that on the table with 10 records only, we will fetch the same records many and many times. In this case, it is a waste of time, because no matter fast SQLite is, it will always be more expensive to query, cursor, fetch, etc... than just grabbing the record from a kind of "memory cache". I want to make clear that we don't plan to keep all data in memory always, just some tables we query very often.
And we came to the original question: What is the best way to "cache" those records? I really like to focus the discussion on that and not "why do you need to cache data?"
The vast majority of the apps on the platform (contacts, Email, Gmail, calendar, etc.) do not do this. Some of these have extremely complicated database schemas with potentially a large amount of data and do not need to do this. What you are proposing to do is going to cause huge pain for you, with no clear gain.
You should first focus on designing your database and schema to be able to do efficient queries. There are two main reasons I can think of for database access to be slow:
You have really complicated data schemas.
You have a very large amount of data.
If you are going to have a lot of data, you can't afford to keep it all in memory anyway, so this is a dead end. If you have complicated structures, you would benefit in either case with optimizing them to improve performance. In both cases, your database schema is going to be key to good performance.
Actually optimizing the schema can be a bit a of a black art (and I am no expert on it), but some things to look out for are correctly creating indices on rows you will query, designing joins so they will take efficient paths, etc. I am sure there are lots of people who can help you with this area.
You could also try looking at the source of some of the platform's databases to get some ideas of how to design for good performance. For example the Contacts database (especially starting with 2.0) is extremely complicated and has a lot of optimizations to provide good performance on relatively large data and extensible data sets with lots of different kinds of queries.
Update:
Here's a good illustration of how important database optimization is. In Android's media provider database, a newer version of the platform changed the schema significantly to add some new features. The upgrade code to modify an existing media database to the new schema could take 8 minutes or more to execute.
An engineer made an optimization that reduced the upgrade time of a real test database from 8 minutes to 8 seconds. A 60x performance improvement.
What was this optimization?
It was to create a temporary index, at the point of upgrade, on an important column used in the upgrade operations. (And then delete it when done.) So this 60x performance improvement comes even though it also includes the time needed to build an index on one of the columns used during upgrading.
SQLite is one of those things where if you know what you are doing it can be remarkably efficient. And if you don't take care in how you use it, you can end up with wretched performance. It is a safe bet, though, if you are having performance issues with it that you can fix them by improving how you are using SQLite.
The problem with a memory cache is of course that you need to keep it in sync with the database. I've found that querying the database is actually quite fast, and you may be pre-optimizing here. I've done a lot of tests on queries with different data sets and they never take more than 10-20 ms.
It all depends on how you're using the data, of course. ListViews are quite well optimized to handle large numbers of rows (I've tested into the 5000 range with no real issues).
If you are going to stay with the memory cache, you may want have the database notify the cache when it's contents change and then you can update the cache. That way anyone can update the database without knowing about the caching. Also, if you build a ContentProvider over your database, you can use the ContentResolver to notify you of changes if you register using registerContentObserver.

Categories

Resources