Improving Android SQLite query performance for relational tables

Improving Android SQLite query performance for relational tables - android

Scenario:
I am working with a what I think is a fairly large SQLite database (around 20 MB) in my Android app, which consists of around 50 tables.
Most of these tables are linked by foreign keys, and a lot of the time, I need to retrieve information from two or more tables at a time. To illustrate an example:
Table1:
Id | Name | Attribute1 | Attribute2 | ForeignKey
1 | "Me" | SomeValue | AnotherVal | 49
2 | "A" | ... | ... | 50
3 | "B" | | | 49
Table2:
Id | Attribute3 | Attribute4 | Attribute5
49 | ThirdVal | FourthVal | FifthVal
50 | ... | ... | ...
Sometimes, there are more than two tables that link together in this way. Almost all of the time, there are more columns than those presented above, and there are usually around 1000 rows.
My aim is to display a few of the attributes from the database as items in a RecyclerView, but I will need to use both tables to retrieve these attributes.
My method:
Currently, I am using the android-sqlite-asset-helper library to copy this database (.db extension) from the assets folder into the app. When I recorded the time for this copying to happen, it completed in 732 ms, which is fine.
However, when I want to retrieve the data from two tables using the foreign key from the first table, it takes far too long. It took around 11.47 seconds when I tested this, and I want to speed this up.
The way in which I retrieve the data is that I read each row in the first table, and put it into an object:
public static ArrayList<FirstItem> retrieveFirstItemList(Context context) {
Cursor cursor = new DbHelper(context).getReadableDatabase()
.query(DbHelper.TABLE_NAME, null, null, null, null, null, null);
ArrayList<FirstItem> arrayList = new ArrayList<>();
cursor.moveToFirst();
while (!cursor.isAfterLast()) {
// I read all the values from each column and put them into variables
arrayList.add(new FirstItem(id, name, attribute1, attribute2, foreignKey));
cursor.moveToNext();
}
cursor.close();
return arrayList;
}
The FirstItem object would contain getter methods in addition to another used for getting the SecondItem object from the foreign key:
public SecondItem getSecondItem(Context context) {
Cursor cursor = new SecondDbHelper(context).getReadableDatabase().query(
SecondDbHelper.TABLE_NAME,
null,
SecondDbHelper.COL_ID + "=?",
new String[] {String.valueOf(mForeignKey)},
null, null, null);
cursor.moveToFirst();
SecondItem secondItem = new SecondItem(mForeignKey, attribute3, attribute4, attribute5);
cursor.close();
return secondItem;
}
When I print values from both tables into the logcat (I have decided not to use any UI for now, to test database performance), I use something like this:
for (FirstItem firstItem : DBUtils.retrieveFirstItemList(this)) {
Log.d("First item id", firstItem.getId());
Log.d("Second item attr4", firstItem.getSecondItem(this).getAttribute4());
}
I suspect there is something wrong with this method as it needs to search through Table2 for each row in Table1 - I think it's inefficient.
An idea:
I have one other method I am considering using, however I do not know if it is better than my current solution, or if it is the 'proper' way to achieve what I want. What I mean by this is that I am unsure as to whether there is a way I could slightly modify my current solution to significantly increase performance. Nevertheless, here is my idea to improve the speeds of reading data from the database.
When the app loads for the first time, data from various tables of the SQLite database would be read then put into one SQLite database in the app. This process would occur when the app is run for the first time and each time the tables from the database are updated. I am aware that this would result in duplication of data across different rows, but it is the only way I see that would avoid me having to search multiple tables to produce a list of items.
// read values from SQLite database and put them in arrays
ContentValues cv = new ContentValues();
// put values into variables
cv.put(COL_ID, id);
...
db.insert(TABLE_NAME, null, values);
Since this process would also take a long time (as there are multiple rows), I was a little concerned that this would not be the best idea, however I read about transactions in some Stack Overflow answers, which would increase write speeds. In other words, I would use db.beginTransaction();, db.setTransactionSuccessful(); and db.endTransaction(); appropriately to increase the performance when rewriting the data to a new SQLite database.
So the new table would look like this:
Id | Name | Attribute1 | Attribute2 | Attribute3 | Attribute4 | Attribute5
1 | "Me" | SomeValue | AnotherVal | ThirdVal | FourthVal | FifthVal
2 | "A" | ... | ... | ... | ... | ...
3 | "B" | SomeValue | AnotherVal | ThirdVal | FourthVal | FifthVal
This means that although there would be more columns in the table, I would avoid having to search through multiple tables for each row in the first table, and the data would be more easily accessible too (for filtering and things like that). Most of the 'loading' would be done at the start, and hopefully sped up with methods for transactions.
Overview:
To summarise, I want to speed up reading from an SQLite database with multiple tables, where I have to look through these tables for each row of the first table in order to produce the desired result. This takes a long time, and is inefficient, but I'm not sure if there is a way I can adjust my current method to greatly improve read speeds. I think I should 'load' the data when the app is first run, by reorganising the data from various tables into one table.
So I am asking, which of the two methods is better (mostly concerning performance)? Is there a way I can adjust my current method or is there something I am doing incorrectly? Finally, if there is a better way to do this than the two methods I have already mentioned, what is it and how would I go about implementing it?

A couple of things that you should try:
Optimise your loading. As far as I understood your current method, it runs into the N + 1 queries problem. You have to execute a query to get the first batch of data, and then another query for every row of the original result set, so you can fetch the related data. It's normal that you get a performance problem with that approach. I don't think it's scalable and I would recommend you move away from it. The easiest way is to use joins instead of multiple queries. This is referred to as eager loading.
Introduce appropriate indexes on your tables. If you are performing a lot of joins, you should really think about speeding them up. Indexes are the obvious choice here. Normally, primary key columns are indexed by default, but foreign keys are not. This means that you perform linear searches on the your tables for each join, and this is slow. I would try and introduce indexes on your foreign key columns (and all columns that are used in joins). Try to measure the performance of a join before and after to see if you have made any progress there.
Consider using database views. They are quite useful when you have to perform joins often. When creating a view, you get a precompiled query and save quite a bit of time compared to running the join each time. You can try executing the query using joins and against a view and this will show how much time you will save. The downside of this is that it is a bit harder to map your result set to a hierarchy of Java objects, but, at least in my experience, the performance gain is worth.
You can try and use some kind of lazy loading. Defer loading the related data unless it is being explicitly requested. This can be hard to implement, and I think that it should be your last resort, but it's an option nevertheless. You may get creative and leverage dynamic proxies or something like this to actually perform the loading logic.
To summarise, being smart with indexes / views should do the trick most of the time. Combine this with eager / lazy loading, and you should be able to get to the point where you are happy with your performance.
EDIT: Info on Indexes, Views and Android Implementation
Indexes and Views are not alternatives to the same problem. They have different characteristics and application.
When you apply an Index to a column, you speed up the search on those column's values. You can think of it as a linear search vs. a tree search comparison. This speeds up join, because the database already knows which rows correspond to the foreign key value in question. They have a beneficial effect on simple select statements as well, not only ones using joins, since they also speed up the execution of where clause criteria. They come with a catch, though. Indexes speed up the queries, but they slow down insert, update and delete operations (since the indexes have to maintained as well).
Views are just precompiled and stored queries, whose result sets you can query just like a normal table. The gain here is that you don't need to compile and validate the query each time.
You should not limit yourself to just one of the two things. They are not mutually exclusive and can give you optimal results when combined.
As far as Android implementation goes, there is not much to do. SQLite supports both indexes and queries out of the box. The only thing you have to do is create them. The easiest way is to modify your database creation script to include CREATE INDEX and CREATE VIEW statements. You can combine the creation of a table with the creation of a index, or you can add it later manually, if you need to update an already existing schema. Just check the SQLite manual for the appropriate syntax.

maybe try this : https://realm.io/products/java/ i never use it, i know nothing about their performance. It can be a way that can interest you.. or not ;)

Related

Total sum of two columns trigger on SQLite android

Well as title says, I'm trying to set a column value based on the sum of two columns in the same table, I mean, I have a row on a table where I have some attributes "capital,income,mat_expense,other_expense,net_profit" I will be updating this row everytime I sell a new product or register some payments, when I sell a new product, I will update the "income" attribute adding the sale price, when I register a mat_expense(raw material expense) I will update that attribut adding the new expense price the same action with other_expense, my point is I want to calculate the net_profit of my sell, If i sell 20$ and I spend 10$ on raw material, I want my net_profit attribute to be 10, and make the same operation everytime I update de the table, the same thing I wanna do with the "capital" attribute (income - (mat_expense + other_expense), that is basiclly what I need to do, I've been reading this operation should be done by a "trigger" on sqlite, I have been reading some post but I don't get how to fit it to my case, can you guys give me a hands with this? Example:
| capital | income | mat_expense | other_expense | net_profit |
5 20 10 5 10
By the way, this is a consult, could be possible to make a trigger which make an attribute acts as a accumulator? as I explained before, I'll be upgrading some attributes, adding new values,everytime I do that, I need to consult the currently value save it in a Variable then I sum the new value, which I think It's not much efficient.
Thank you so much by reading and I really thank any help you guys can give me.

In SQL you can use expressions to define new columns, ie:
select
income,
mat_expense,
other_expense,
income - (mat_expense + other_expense) as capital
from your_table;
You will get the 4rd column called 'capital'
As for the second question - you should use such calculated virtual columns whenever it's possible. Expressions may be quite complex and include SQL functions, and even include subqueries. For example you can add a column with minimal value from other table rows correlated with current row from the first table.
Generally SQL language is about transforming source sets of data into other representations, some tables into others.
When you can't calculate result set in single SQL statement, then you may have to calculate intermediate temporary data/variables, may be via temp tables/cursors etc, but it is the last thing one should do. Still we can't avoid it sometimes

In sqlite query how to get a specific row and column that has the required value

I am trying to build a quiz app and the entries in the database look like this
id | term | synonym 1 | synonym 1 answered | synonym 2 | synonym 2 answered | ....
----------------------------------------------------------------------------------
1 | read | recite | Yes | study | No |
2 | work | effort | No | labour | Yes |
The idea is to present one synonym at a time. Once the next synonym is chosen, the previous synonym is marked as answered using the column next to the synonym with "Yes"
For the logic to select a word I am using a Collection.shuffle() function to get a random row and random column, query database to see if its answered column is "No". If "Yes" I am repeating the shuffle till I get a "No".
For knowing if atleast one entry in the entire table has a "No" in any of the 'answered' columns, I am using an OR clause against the answered columns (to make sure that all synonyms are not already answered)
So my app is doing lot of iterations to get the desired word which is definitely very bad way.
I am unable to figure a way to let sqlite query return me a random row and column that has the word "No". If I can get the column name of the result that had "No", I can strip the word 'answered' and get the related synonym column in a row, and present it with out much of java code.
Can any one kindly enlighten me on this and give a solution? I would require the column name and the rowid of the resulting match of a word "No" in the entire table. And it must be at RANDOM
Edit:
The scenario given here is for simplicity. The actual app deals with Sanskrit grammar and requires the kind of implementation I am planning. I require to get the 'name' of the column and 'id' of the row that got a 'No'. In the implementation I won't be using 'Yes' and 'No'. The number of columns would be fixed for all the terms. For simplicity I gave this example.

I would recommend the following:
First normalize your table structure. There will be two tables: Terms and Synonyms
Terms table will have Id and TermName
In the Synonyms table there will be SynonymId, Name, Answered TermId So, you can track synonyms through TermId here.
This way, you can easily query to see what synonyms are still not answered for a specific term like:
SELECT * FROM Synomyms WHERE TermId = 1 AND Answered = "No"
Hope this helps

Well, the yes/no part is a little tedious, especially if you did it that way (to me, another relational table would make more sense, but that's another story).
SELECT *
FROM table
WHERE
synonym_1_answered = 'No'
OR synonym_2_answered = 'No'
OR ...
ORDER BY RANDOM() LIMIT 1;
Filter the ones with a "No" using where, and use the random function to "sort" (more like unsort) them.
Edit:
Once you're done with the data selection, and provided you have a Cursor pointing to this result, it's as easy as:
Long long = cursor.getLong(cursor.getColumnIndex("long"));
boolean is_synonym_1_answered =
cursor.getString(
cursor.getColumnIndex("synonym_1_answered")
).equals('Yes');
boolean is_synonym_2_answered =
cursor.getString(
cursor.getColumnIndex("synonym_2_answered")
).equals('Yes');
...
After that, you only need to check which of the booleans is negative, and thus that synonym hasn't been answered yet. You can speed up things by storing the positions of your query in constants (so ID is 0, TERM is 1... The typical situation is to store them in an Array). That way you don't waste time asking the cursor for the position of a column:
Long long = cursor.getLong(ID);
boolean is_synonym_1_answered = cursor.getString(SYNONYM_1_ANS).equals('Yes');
boolean is_synonym_2_answered = cursor.getString(SYNONYM_2_ANS).equals('Yes');
...
Again, to speed up things, a second table containing just a synonym and a reference to the original term would make things way easier (as every row returned by a query like that would be an unchecked synonym).

(Java) Data Structure for agenda-like program?

In a nutshell, I've parsed data like this from XML:
root ---- date1 ---- time 1 ----item 1 ---- prop 1 --- a
| | | | |-- b
| | | |
| | | |- prop 2
| | |
| | |--item 2
| |
| |- time 2
|
|
|- date2 ---- time 1
each item has several properties (like item1 has prop1 and prop2)
each time has several items
each date has several times
root has several dates as its children
And I want to use this data:
Be able to show user the content within each date. e.g. if date1 is current date, then shows user each item sorted by time, just display these data kind like Android Google Calendar's Agenda View.
So may I ask what data structure should I use to achieve this?
The possible candidates I've thought are:
Nested HashMap, like HashMap<Time, HashMap<item1, HashMap<prop1, a>>>;
SQL database, create table for each time, then put item in row, prop in column
It's totally ok for this app not to strictly keep the tree-like structure when store the data, so I prefer SQL database, may I ask you guy for possible better solutions?
This app is going to running on Android, thanks for any help.
UPDATE
My thought of the database structure:
create database table for each date;
Then in each DB table:
time prop1 prop2 prop3
item1
item2
...
Then maybe later I can retrieve items by using select on time basis.
UPDATE2
I need to create table for dates because some of the item may be the same. Please imagine you put two same meetings on different days in a week. I can't create tables for each item because it is difficult to extract those info from original XML than just create date tables and put them in database.

Create one table for all and us a date column to retrieve data as you wish. you don't need to make it complicated like creating tables for all dates.

My approach would be to create a DAO (Data Access Object) that would be responsible for taking the object that resulted from parsing the XML and storing all the data on SQLite correctly. The DAO would also be responsible for re-making the object from the SQLite data.
Keep in mind that this translation won't be as easy as you mentioned though. You said "create table for each time, then put item in row, prop in column". This is not a standard DB model, as usually you don't create a table for each instance of a given entity.
Start by analyzing what entities you have (e.g., date, time, item) and then analyze the relationship between them (e.g., a data contains one or more time, a time contains one or more items, etc). Usually each entity will become a table on your DB, and relationships will either go inside an existing table (as a reference to other tables) or they will have their own table. You can read more about this approach here: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model

Custom ContentProvider for complex sql database with multiple tables

I am trying to create a library that includes functionality in an already written app. The old app uses a complex sqlite database. I am trying to reuse the old database as the backend for a content provider and can't figure out what would be "best practice" for what I am trying to do and how to adapt the various examples (this, this, and others) I have found to my situation.
Supoose you have this database structure
+------+-------------+-------------+-------------+
| Root | Folder | Item | Subitem |
+------+-------------+-------------+-------------+
| _id | _id | _id | _id |
| uuid | parent_uuid | parent_uuid | parent_uuid |
| | uuid | uuid | uuid |
| | name | name | name |
| | data | data | data |
+------+-------------+-------------+-------------+
Root->Folder->Item->Subitem
Before, I used a DbAdapter class where I provided function calls that took parameters such as parent_uuid, handled all the sql query stuff inside the function, then returned a cursor with the results
example function stubs:
get_items_by_parent_uuid(folder_uuid)
get_item_by_uuid(uuid)
same for Subitem also
Complex queries
get_items_for_root(root_uuid)
returns cursor with item uuid, item name, item data, folder name
get_items_with_subitem_count(folder_uuid)
returns cursor with item uuid, item name, item data, count of subitems where subitem.parent_uuid == item.uuid
I can't figure out the best way to provide the functionality above with 1 ContentProvider.
I don't need someone to write me tons of code (but if you do I'm ok with that too), I just want someone to help me understand how to modify the above linked examples to do these things, because I do mostly understand the examples, just not enough to translate them to my current needs.
TL;DR; - How do I write a single ContentProvider that handles multiple tables, doesn't depend on _id as the unique identifier, and handles joins along with selects that may have inner-selects/queries (such as select count(*))

How do I write a single ContentProvider that handles multiple tables
Step #1: Design a REST interface for your schema, limiting yourself to a simple JSON object as the data structure (i.e., no nested arrays or objects, just a map of keys to simple values)
Step #2: Convert that design to a ContentProvider, replacing the http://sooperapp.jp36.com/ with content://com.jp36.sooperapp and replacing the JSON with Cursors and ContentValues
So, for example, you might support content://com.jp36.sooperapp/folder and content://com.jp36.sooperapp/item and content://com.jp36.sooperapp/subitem as the basis of retrieving/modifying information about one or more of each of those types.
doesn't depend on _id as the unique identifier
If you plan on using CursorAdapter, and assuming that by uuid you really do mean a UUID (which is typically a string), then you have no choice but to also have _id. If, however, you do not plan on using CursorAdapter, you have no particular need for _id.
and handles joins along with selects that may have inner-selects/queries (such as select count(*))
That's all a matter of your REST interface/ContentProvider design. If you want to have content://com.jp36.sooperapp/folder/count be something you query upon that, behind the scenes, does SELECT COUNT(*) FROM Folder, knock yourself out.
(note: do not literally knock yourself out)
If you want content://com.jp36.sooperapp/omg/omg/omg/this/is/a/long/path to INSERT an Item and 17 Subitems based upon some insert() call to the provider, go right ahead. ContentProvider is merely a facade; it is up to you to define what the Uri means, what the ContentValues mean, what the query() parameters mean, etc.
Personally, I would recommend that you step back and ask yourself why you are bothering with a ContentProvider.

Dealing w/ Sqlite Join results in a cursor

I have a one-many relationship in my local Sqlite db. Pretty basic stuff. When I do my left outer join I get back results that look like this:
the resulting cursor has multiple rows that look like this:
A1.id | A1.column1 | A1.column2 | B1.a_id_fk | B1.column1 | B1.column2
A1.id | A1.column1 | A1.column2 | B2.a_id_fk | B2.column1 | B2.column2
and so on...
Is there a standard practice or method of dealing with results like this ? Clearly there is only A1, but it has many B-n relationships. I am coming close to using multiple queries instead of the "relational db way". Hopefully I am just not aware of the better way to do things.
I intend to expose this query via a content provider and I would hate for all of the consumers to have to write the same aggregation logic.

Joins have worked this way in every SQL database I've ever used -- it's kinda the definition of the concept.
Bear in mind that content providers involve remote procedure calls, which are expensive. That's one of the reason why Android denormalizes results along the same lines as your join for things like contacts, to minimize the round-trips between processes.
You could consider using a remote service instead of a content provider, and exposing a custom API. You could return the B's for a given A by a simple List<>. Or, you could serialize the whole thing in JSON format. Or any number of other possibilities, if the data duplication disturbs you.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.