Dealing w/ Sqlite Join results in a cursor - android

I have a one-many relationship in my local Sqlite db. Pretty basic stuff. When I do my left outer join I get back results that look like this:
the resulting cursor has multiple rows that look like this:
A1.id | A1.column1 | A1.column2 | B1.a_id_fk | B1.column1 | B1.column2
A1.id | A1.column1 | A1.column2 | B2.a_id_fk | B2.column1 | B2.column2
and so on...
Is there a standard practice or method of dealing with results like this ? Clearly there is only A1, but it has many B-n relationships. I am coming close to using multiple queries instead of the "relational db way". Hopefully I am just not aware of the better way to do things.
I intend to expose this query via a content provider and I would hate for all of the consumers to have to write the same aggregation logic.

Joins have worked this way in every SQL database I've ever used -- it's kinda the definition of the concept.
Bear in mind that content providers involve remote procedure calls, which are expensive. That's one of the reason why Android denormalizes results along the same lines as your join for things like contacts, to minimize the round-trips between processes.
You could consider using a remote service instead of a content provider, and exposing a custom API. You could return the B's for a given A by a simple List<>. Or, you could serialize the whole thing in JSON format. Or any number of other possibilities, if the data duplication disturbs you.

Related

Improving Android SQLite query performance for relational tables

Scenario:
I am working with a what I think is a fairly large SQLite database (around 20 MB) in my Android app, which consists of around 50 tables.
Most of these tables are linked by foreign keys, and a lot of the time, I need to retrieve information from two or more tables at a time. To illustrate an example:
Table1:
Id | Name | Attribute1 | Attribute2 | ForeignKey
1 | "Me" | SomeValue | AnotherVal | 49
2 | "A" | ... | ... | 50
3 | "B" | | | 49
Table2:
Id | Attribute3 | Attribute4 | Attribute5
49 | ThirdVal | FourthVal | FifthVal
50 | ... | ... | ...
Sometimes, there are more than two tables that link together in this way. Almost all of the time, there are more columns than those presented above, and there are usually around 1000 rows.
My aim is to display a few of the attributes from the database as items in a RecyclerView, but I will need to use both tables to retrieve these attributes.
My method:
Currently, I am using the android-sqlite-asset-helper library to copy this database (.db extension) from the assets folder into the app. When I recorded the time for this copying to happen, it completed in 732 ms, which is fine.
However, when I want to retrieve the data from two tables using the foreign key from the first table, it takes far too long. It took around 11.47 seconds when I tested this, and I want to speed this up.
The way in which I retrieve the data is that I read each row in the first table, and put it into an object:
public static ArrayList<FirstItem> retrieveFirstItemList(Context context) {
Cursor cursor = new DbHelper(context).getReadableDatabase()
.query(DbHelper.TABLE_NAME, null, null, null, null, null, null);
ArrayList<FirstItem> arrayList = new ArrayList<>();
cursor.moveToFirst();
while (!cursor.isAfterLast()) {
// I read all the values from each column and put them into variables
arrayList.add(new FirstItem(id, name, attribute1, attribute2, foreignKey));
cursor.moveToNext();
}
cursor.close();
return arrayList;
}
The FirstItem object would contain getter methods in addition to another used for getting the SecondItem object from the foreign key:
public SecondItem getSecondItem(Context context) {
Cursor cursor = new SecondDbHelper(context).getReadableDatabase().query(
SecondDbHelper.TABLE_NAME,
null,
SecondDbHelper.COL_ID + "=?",
new String[] {String.valueOf(mForeignKey)},
null, null, null);
cursor.moveToFirst();
SecondItem secondItem = new SecondItem(mForeignKey, attribute3, attribute4, attribute5);
cursor.close();
return secondItem;
}
When I print values from both tables into the logcat (I have decided not to use any UI for now, to test database performance), I use something like this:
for (FirstItem firstItem : DBUtils.retrieveFirstItemList(this)) {
Log.d("First item id", firstItem.getId());
Log.d("Second item attr4", firstItem.getSecondItem(this).getAttribute4());
}
I suspect there is something wrong with this method as it needs to search through Table2 for each row in Table1 - I think it's inefficient.
An idea:
I have one other method I am considering using, however I do not know if it is better than my current solution, or if it is the 'proper' way to achieve what I want. What I mean by this is that I am unsure as to whether there is a way I could slightly modify my current solution to significantly increase performance. Nevertheless, here is my idea to improve the speeds of reading data from the database.
When the app loads for the first time, data from various tables of the SQLite database would be read then put into one SQLite database in the app. This process would occur when the app is run for the first time and each time the tables from the database are updated. I am aware that this would result in duplication of data across different rows, but it is the only way I see that would avoid me having to search multiple tables to produce a list of items.
// read values from SQLite database and put them in arrays
ContentValues cv = new ContentValues();
// put values into variables
cv.put(COL_ID, id);
...
db.insert(TABLE_NAME, null, values);
Since this process would also take a long time (as there are multiple rows), I was a little concerned that this would not be the best idea, however I read about transactions in some Stack Overflow answers, which would increase write speeds. In other words, I would use db.beginTransaction();, db.setTransactionSuccessful(); and db.endTransaction(); appropriately to increase the performance when rewriting the data to a new SQLite database.
So the new table would look like this:
Id | Name | Attribute1 | Attribute2 | Attribute3 | Attribute4 | Attribute5
1 | "Me" | SomeValue | AnotherVal | ThirdVal | FourthVal | FifthVal
2 | "A" | ... | ... | ... | ... | ...
3 | "B" | SomeValue | AnotherVal | ThirdVal | FourthVal | FifthVal
This means that although there would be more columns in the table, I would avoid having to search through multiple tables for each row in the first table, and the data would be more easily accessible too (for filtering and things like that). Most of the 'loading' would be done at the start, and hopefully sped up with methods for transactions.
Overview:
To summarise, I want to speed up reading from an SQLite database with multiple tables, where I have to look through these tables for each row of the first table in order to produce the desired result. This takes a long time, and is inefficient, but I'm not sure if there is a way I can adjust my current method to greatly improve read speeds. I think I should 'load' the data when the app is first run, by reorganising the data from various tables into one table.
So I am asking, which of the two methods is better (mostly concerning performance)? Is there a way I can adjust my current method or is there something I am doing incorrectly? Finally, if there is a better way to do this than the two methods I have already mentioned, what is it and how would I go about implementing it?
A couple of things that you should try:
Optimise your loading. As far as I understood your current method, it runs into the N + 1 queries problem. You have to execute a query to get the first batch of data, and then another query for every row of the original result set, so you can fetch the related data. It's normal that you get a performance problem with that approach. I don't think it's scalable and I would recommend you move away from it. The easiest way is to use joins instead of multiple queries. This is referred to as eager loading.
Introduce appropriate indexes on your tables. If you are performing a lot of joins, you should really think about speeding them up. Indexes are the obvious choice here. Normally, primary key columns are indexed by default, but foreign keys are not. This means that you perform linear searches on the your tables for each join, and this is slow. I would try and introduce indexes on your foreign key columns (and all columns that are used in joins). Try to measure the performance of a join before and after to see if you have made any progress there.
Consider using database views. They are quite useful when you have to perform joins often. When creating a view, you get a precompiled query and save quite a bit of time compared to running the join each time. You can try executing the query using joins and against a view and this will show how much time you will save. The downside of this is that it is a bit harder to map your result set to a hierarchy of Java objects, but, at least in my experience, the performance gain is worth.
You can try and use some kind of lazy loading. Defer loading the related data unless it is being explicitly requested. This can be hard to implement, and I think that it should be your last resort, but it's an option nevertheless. You may get creative and leverage dynamic proxies or something like this to actually perform the loading logic.
To summarise, being smart with indexes / views should do the trick most of the time. Combine this with eager / lazy loading, and you should be able to get to the point where you are happy with your performance.
EDIT: Info on Indexes, Views and Android Implementation
Indexes and Views are not alternatives to the same problem. They have different characteristics and application.
When you apply an Index to a column, you speed up the search on those column's values. You can think of it as a linear search vs. a tree search comparison. This speeds up join, because the database already knows which rows correspond to the foreign key value in question. They have a beneficial effect on simple select statements as well, not only ones using joins, since they also speed up the execution of where clause criteria. They come with a catch, though. Indexes speed up the queries, but they slow down insert, update and delete operations (since the indexes have to maintained as well).
Views are just precompiled and stored queries, whose result sets you can query just like a normal table. The gain here is that you don't need to compile and validate the query each time.
You should not limit yourself to just one of the two things. They are not mutually exclusive and can give you optimal results when combined.
As far as Android implementation goes, there is not much to do. SQLite supports both indexes and queries out of the box. The only thing you have to do is create them. The easiest way is to modify your database creation script to include CREATE INDEX and CREATE VIEW statements. You can combine the creation of a table with the creation of a index, or you can add it later manually, if you need to update an already existing schema. Just check the SQLite manual for the appropriate syntax.
maybe try this : https://realm.io/products/java/ i never use it, i know nothing about their performance. It can be a way that can interest you.. or not ;)

(Java) Data Structure for agenda-like program?

In a nutshell, I've parsed data like this from XML:
root ---- date1 ---- time 1 ----item 1 ---- prop 1 --- a
| | | | |-- b
| | | |
| | | |- prop 2
| | |
| | |--item 2
| |
| |- time 2
|
|
|- date2 ---- time 1
each item has several properties (like item1 has prop1 and prop2)
each time has several items
each date has several times
root has several dates as its children
And I want to use this data:
Be able to show user the content within each date. e.g. if date1 is current date, then shows user each item sorted by time, just display these data kind like Android Google Calendar's Agenda View.
So may I ask what data structure should I use to achieve this?
The possible candidates I've thought are:
Nested HashMap, like HashMap<Time, HashMap<item1, HashMap<prop1, a>>>;
SQL database, create table for each time, then put item in row, prop in column
It's totally ok for this app not to strictly keep the tree-like structure when store the data, so I prefer SQL database, may I ask you guy for possible better solutions?
This app is going to running on Android, thanks for any help.
UPDATE
My thought of the database structure:
create database table for each date;
Then in each DB table:
time prop1 prop2 prop3
item1
item2
...
Then maybe later I can retrieve items by using select on time basis.
UPDATE2
I need to create table for dates because some of the item may be the same. Please imagine you put two same meetings on different days in a week. I can't create tables for each item because it is difficult to extract those info from original XML than just create date tables and put them in database.
Create one table for all and us a date column to retrieve data as you wish. you don't need to make it complicated like creating tables for all dates.
My approach would be to create a DAO (Data Access Object) that would be responsible for taking the object that resulted from parsing the XML and storing all the data on SQLite correctly. The DAO would also be responsible for re-making the object from the SQLite data.
Keep in mind that this translation won't be as easy as you mentioned though. You said "create table for each time, then put item in row, prop in column". This is not a standard DB model, as usually you don't create a table for each instance of a given entity.
Start by analyzing what entities you have (e.g., date, time, item) and then analyze the relationship between them (e.g., a data contains one or more time, a time contains one or more items, etc). Usually each entity will become a table on your DB, and relationships will either go inside an existing table (as a reference to other tables) or they will have their own table. You can read more about this approach here: http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model

Custom ContentProvider for complex sql database with multiple tables

I am trying to create a library that includes functionality in an already written app. The old app uses a complex sqlite database. I am trying to reuse the old database as the backend for a content provider and can't figure out what would be "best practice" for what I am trying to do and how to adapt the various examples (this, this, and others) I have found to my situation.
Supoose you have this database structure
+------+-------------+-------------+-------------+
| Root | Folder | Item | Subitem |
+------+-------------+-------------+-------------+
| _id | _id | _id | _id |
| uuid | parent_uuid | parent_uuid | parent_uuid |
| | uuid | uuid | uuid |
| | name | name | name |
| | data | data | data |
+------+-------------+-------------+-------------+
Root->Folder->Item->Subitem
Before, I used a DbAdapter class where I provided function calls that took parameters such as parent_uuid, handled all the sql query stuff inside the function, then returned a cursor with the results
example function stubs:
get_items_by_parent_uuid(folder_uuid)
get_item_by_uuid(uuid)
same for Subitem also
Complex queries
get_items_for_root(root_uuid)
returns cursor with item uuid, item name, item data, folder name
get_items_with_subitem_count(folder_uuid)
returns cursor with item uuid, item name, item data, count of subitems where subitem.parent_uuid == item.uuid
I can't figure out the best way to provide the functionality above with 1 ContentProvider.
I don't need someone to write me tons of code (but if you do I'm ok with that too), I just want someone to help me understand how to modify the above linked examples to do these things, because I do mostly understand the examples, just not enough to translate them to my current needs.
TL;DR; - How do I write a single ContentProvider that handles multiple tables, doesn't depend on _id as the unique identifier, and handles joins along with selects that may have inner-selects/queries (such as select count(*))
How do I write a single ContentProvider that handles multiple tables
Step #1: Design a REST interface for your schema, limiting yourself to a simple JSON object as the data structure (i.e., no nested arrays or objects, just a map of keys to simple values)
Step #2: Convert that design to a ContentProvider, replacing the http://sooperapp.jp36.com/ with content://com.jp36.sooperapp and replacing the JSON with Cursors and ContentValues
So, for example, you might support content://com.jp36.sooperapp/folder and content://com.jp36.sooperapp/item and content://com.jp36.sooperapp/subitem as the basis of retrieving/modifying information about one or more of each of those types.
doesn't depend on _id as the unique identifier
If you plan on using CursorAdapter, and assuming that by uuid you really do mean a UUID (which is typically a string), then you have no choice but to also have _id. If, however, you do not plan on using CursorAdapter, you have no particular need for _id.
and handles joins along with selects that may have inner-selects/queries (such as select count(*))
That's all a matter of your REST interface/ContentProvider design. If you want to have content://com.jp36.sooperapp/folder/count be something you query upon that, behind the scenes, does SELECT COUNT(*) FROM Folder, knock yourself out.
(note: do not literally knock yourself out)
If you want content://com.jp36.sooperapp/omg/omg/omg/this/is/a/long/path to INSERT an Item and 17 Subitems based upon some insert() call to the provider, go right ahead. ContentProvider is merely a facade; it is up to you to define what the Uri means, what the ContentValues mean, what the query() parameters mean, etc.
Personally, I would recommend that you step back and ask yourself why you are bothering with a ContentProvider.

How to organize sqlite database

this is more of a question of theory than anything else. I am writing an android app that uses a pre-packaged database. The purpose of the app is solely to search through this database and return values. Ill provide some abstract examples to illustrate my implementation and quandary. The user can search by: "Thing Name," and what I want returned to the user is values a, b, and c. I initially designed the database to have it all contained on a single sheet, and have column 1 be key_index, column 2 be name, column 3 be a, etc etc. When the user searches, the cursor will return the key_index, and then use that to pull values a b and c.
However, in my database "Thing alpha" can have a value a = 4 or a = 6. I do not want to repeat data in the database, i.e. have multiple rows with the same thing alpha, only separate "a" values. So what is the best way to organize the data given this situation? Do I keep all the "Thing Names" in a single sheet, and all the data separately. This is really a question of proper database design, which is definitely something foreign to me. Thanks for your help!
There's a thing called database normalization http://en.wikipedia.org/wiki/Database_normalization. You usually want to avoid redundancy and dependency in the DB entities using a corresponding design with surrogate keys and foreign keys and so on. Your "thing aplpha" looks like you want to have a many-to-many table like e.g. one or many songs belong/s to the same or different genres. You may want to create dictionary tables to hold your id,name pairs and have foreign keys referencing these tables. In your case it will be mostly a read-only DB so you might want to consider creating indexes with high FILLFACTOR percentage don't think sqlite allows it to do though. There're many ways to design the database. Everything depends on the purpose of DB. You can start with a design of your hardware like raids/file systems/db block sizes to match the F-System's block sizes in order to keep the I/O optimal and where to put your tablespaces/filegroups/indexes to balance the i/o load. The whole DB design theory/task is really a deep subject which is not to be underestimated nor is a matter of few sentences in the answer of stackoverflow. :)
without understanding your data better here is my guess at what you are looking for.
table: product
- _id
- name
table: attribute
- product_id
- a

What is best: 1 table per record or 1 table with all records linked with foreign keys?

I have an application that lets users create different forms (surveys) and then fill them. (so its a substitute for paper).
Here's the current model i'm using in the app:
Table 1)
+-------------------------+
| SURVEYS TABLE |
+----+------+-------------+
| ID | name | description |
+----+------+-------------+
Table 2)
+-----------------------------------+
| $[name_of_the_survey] |
+----+-------+------+-------+-------+
| ID | field | type | value | items |
+----+-------+------+-------+-------+
Table 3)
+--------------------------------------+
| $[name_of_the_survey] _records |
+----+---------------------------------+
| ID | columns specific to each survey |
+----+---------------------------------+
so basically when a user creates a survey, the programs inserts a record in Surveys Table and then creates 2 tables:
table (2) for the fields of the form
table (3) for the records that will be stores, in which the columns correspond to table (2) rows.
It works but has some limitations. For instance, when you which to add a field to table (2), it has to read table (3) contents, save it to a virtual table, drop previous table (3) and create a new one. This can be a performance issue when the table(3) has a lot of records.
So my question is... Is there a better database design?
Using a separate table for each survey nearly invalidates the use of a database. You might as well just store the results in files.
You do, however, need three tables: Survey Definition, Survey Questions, and Survey Answers. It may look something like this...
Surveys:
ID; name; description
Questions:
ID; text; surveyID
Answers:
ID; answer; questionID
You could add complexity from there to handle enumerated answers...
Surveys:
ID; name; description
Questions:
ID; text; surveyID
Choices:
ID; choice; questionID
Answers:
ID; choiceID
You use the relationships between each table to aggregate to the next highest level, allowing you to get results from any question, survey, or any other attributes for any model you choose to add without trying to abstract away the source for your select statements. This also allows you to aggregate answers per user or surveying organization later on after adding them to your schema. If each survey has its own table structure, aggregating data across surveys becomes hugely impractical as your application grows.
You might try taking a look at http://en.wikipedia.org/wiki/Database_normalization#Normal_forms
The above is quite a formal way of improving DBs in general, and some of the steps are relevant to your DB. I think it's a bit confusing with all the ID fields. Do you really need them for each one? Are survey names not unique?
You've implied that the survey data fields are quite unique. Personally I would sort put each survey into a file, and just give it a standard format. It isn't a bad idea if the tendency is to read an entire survey at once. I'd only use a DB if I needed to sort or pick and choose bits of data.

Categories

Resources