Pre-populated Trie - android

Background:
My CSS360 group is attempting to create an Android application that will include an auto-complete search feature. The data that we'll be searching consists of around 7000 entries, and will be stored in a SQLite database on the phone itself. The most obvious approach would be to do a linear search of the database following every character that the user types, and then return a list of suggestions which are potential alphabetic extensions of the user's query. However, this seems like it would be pretty inefficient, and we've been looking for better alternatives. In another one of my classes today, my instructor briefly discussed the trie data structure, and mentioned that it's often used to store entire dictionaries. Entries into a trie can be retrieved in logarithmic time (as opposed to linear time for a regular old array), so this seems like a great tool for us to use! Unfortunately, we're in waaaay over our heads on this project already, and none of us really have a clue how to make this happen. All any of us have ever coded to date are basic console applications to teach us programming basics. We're all attempting to learn the Android platform in a week's time by watching YouTube videos, and differing the database stuff to the one guy in our group who has any SQL experience whatsoever. We could seriously use some pointers!
Questions:
When creating a trie, is it possible to have the entire structure pre-populated? IE: generate a line of code for every node used, so that the entire structure will already be in memory when the program starts? My thinking here is that this will save us the overhead of having to regenerate the entire trie from the database every time the program starts. If so, is there an easy way to get these thousands of lines of code into our program? IE: Some sort of script which converts the database files into a giant text file of java commands which can be copied and pasted into Eclipse?
Will there be a considerable amount of overhead if we search the database directly instead of using some sort of internal list or data structure? Should we be copying the names out of the database and searching them inside the program for our auto-complete function?
If this proves too technically difficult for us, and we have to resort to a regular linear search, will the performance be noticeably affected?
Our current plans are to run the auto-complete function each time the user enters a character, and then wait for the function to return before allowing them to continue typing. The only programs any of us have written so far function synchronously like this. What would we need to know to make this function asynchronously? Considering our novice abilities, and the requirements that we're already having to meet, would this be too technically challenging for us?

sqlite should be able to serve this auto-complete functionality reasonably well. I'd recommend using their internal indexes over re-implementing the wheel. If you need to do the latter, then sqlite is probably not going to help you after you've done that work.
If you want substring searching, then full text search is probably your best bet.
If you only want to complete the beginning of the word, then just using their vanilla indexes should be more than enough. If performance is a problem, then just wait until they type three characters before doing the query. Set a limit on your results for snappy responses.

Related

The most efficient way to implement a database using custom data + google fitness api

I am currently learning android programming and creating an app that will store some integers representing user choices (values inserted several times a day, must be displayed in the results activity) and steps data collected Google Fit HISTORY Android APIs, also displayed in the results activity. I am looking for the most efficient way to store this data. I know that it might be possible to insert the custom data types in the GOOGLE fit database. However, I am not sure if it is a good idea if the app mostly works offline, and it needs to immediately represent only a small set of results, for example, the values inserted in the last 2 weeks, with step counts. On the other hand, I am not sure if it is ok to have two databases storing the data.
My apologies if the question sounds a bit too amateur, I am doing my best to find an optimal solution in terms of performance.
Thank you for your answers.
So, to give you my opinion and answer (mainly opinion)
Android has 3 ways (mainly) for storing data:
Files
Online database/API
Local database
for this specific scenario you have listed, wanting the data to be available offline, you should probably be looking at using Room: https://developer.android.com/training/data-storage/room, as it supports storing primitive types without having to write any type converters, you can store models and custom data as well, it uses very basic SQL (because it's a wrapper for the older Sqlite database methods) and is part of android (not an external 3rd party library). Room also requires most operations to be done off of threads, instead of main threads and this will improve your performance as well (also has support for livedata/rxjava to observe straight onto any changes as they happen)
However, as I told this user here:
Should i store one arrayList per file or should i store all my arrayList in the same file?
When starting out, don't worry about the best way for doing something, instead, try something out and learn from it, worrying about the best solution now is rather pointless, either way, happy learning and coding :P

Displaying data from a ContentProvider and a local database in one list on Android

The problem
I need to do a T9 contact search for an Android project I'm working on. Now, it would be simple if I just had to pull contacts from the native contacts storage and then do T9 on that, but the problem I have is that I have an additional local database where we store extra content for some contacts in the form of additional numbers that our application displays and handles. I need to do a search based on the contact’s name, number, and the extra numbers (if any) contained in the local database. The local database has IDs that match those of the contacts in the native Android database.
I have been looking for a solution to this problem, and I have gone through these ideas, but none of them seem to be the right solution.
Try #1
Write a ContentProvider for our local database, in order to be able to perform a simple join operation between the native Android contacts table and our table, however, it seems that joining tables via ContentProviders is only possible when you write your own ContentProvider, thus making this solution not viable for me, not to mention that Android documentation states that you should not write a ContentProvider if you don’t want to share your data with other applications, which we currently don’t.
Try #2
Copy all the needed data from Android’s contacts database into our database, and use ContentObservers to update it constantly. This solution had two major problems: 1) It seemed to have a big overhead, not just on the processor of the device, but also on the development, as we would have to introduce some really delicate update/read/write mechanisms and ensure that our data always stays relatively fresh, while also being performant; 2) A colleague has stated that during contact sync, the ContentObserver fires off events very often, thus making a need for special code to delay the updating, which he says has never really worked out great.
Try #3
Use a CursorJoiner to join the two cursors that I have received and then use a MatrixCursor to display all the data, but that solution is not viable since all the data would be kept in memory, and we are working with datasets that have more than 10k rows of data. Even if the memory could handle it, it would be slow to load, which for T9, isn’t really an option. This also pretty much excludes any solution that doesn't use a Cursor to look over data, which is why I am going in that direction.
Question
Am I missing something obvious? If I am, please point me in the right way. All of the things that I have tried don’t seem feasible to me, but I’m open to someone modifying them in order to make them worthwhile.

Large String Object in SQLite Database

I have a SQLite database which has a table (of course) named Object. In my application, I need to access that table and all of its fields. I am able to query the database and get all of the information I want from a cursor with no issues. The problem comes with deciding what to do with the cursor next. Right now I am thinking of creating a class called Object and it will have fields for every column in the table which will be set by the query. This just seems so... inefficient. I'm not sure how to do this without needing to write out every column that is in the table for the object to use, which seems to violate DRY. Are there any better ways to do this?
My end goal is to be able to access every row in the table and get whatever information I want for that row. For example I will be using this to populate a ListView. If this is too ambiguous let me know and I'll try to clarify.
Thanks!
Edit: I've found the library db40 and it seems to do what I want. The library seems to be kind of big though (40 mb) for a mobile application. Does anybody have experience with this? Everything I've read seems to indicate it is good. I'll post more if I find information.
Are there any better ways to do this?
This is very "wide" question and depends on personal requirements and what is for developer more comfortable. I'm using your idea that is for me the best one you can use.
Generally we can say you want to create ORM (object-relation mapping). It's very clean and efficient approach (my opinion). Of course sometimes is not the best solution to use ORM ( i never met with this case but heard about it). I almost always use my own defined ORM for sure it takes some time but results are significant against done frameworks.
Both have advantages and disadvantages. Own ORM has much more higher performance because it's designated and optimized for concrete solution (mainly queries etc.).
I suggest you to do what you mentioned -> create object that will represent table in database with properties equal to columns. I'm using this in work and we never had problems with performance or too much battery consumption with our applications.
It's also much more safe if you'll show user some data not directly from database but "copies" in objects. Users can do whatever want with dislayed results (they can add some dangerous symbols and hacks) but now you can easily check this before you'll want to update database(s) with changes.
Your source-code looks good, another developer won't be lost in your code, everything will be clear and easy to do updates for future.
I provided you "my opinion" on this thing so hope it'll help you with make a decision.

Which is better? Database or xmlfile? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really like Xml for saving data, but when does sqlite/database become the better option? eg, when the xml has more than x items or is greater than y MB?
I am coding an rss reader and I believe I made the wrong choice in using xml over a sqlite database to store a cache of all the feeds items. There are some feeds which have an xml file of ~1mb after a month, another has over 700 items, while most only have ~30 items and are ~50kb in size after a several months.
I currently have no plans to implement a cap because I like to be able to search through everything.
So, my questions are:
When is the overhead of sqlite/databases justified over using xml?
Are the few large xml files justification enough for the database when there are a lot of small ones, though even the small ones will grow over time? (a long long time)
updated (more info)
Every time a feed is selected in the GUI I reload all the items from that feeds xml file.
I also need to modify the read/unread status which seems really hacky when I loop through all nodes in the xml to find the item and then set it to read/unread.
Man do I have experience with this. I work on a project where we originally stored all of our data using XML, then moved to SQLite. There are many pros and cons to each technology, but it was performance that caused the switchover. Here is what we observed.
For small databases (a few meg or smaller), XML was much faster, and easier to deal with. Our data was naturally in a tree format, which made XML much more attractive, and XPath allowed us to do many queries in one simple line rather than having to walk down an ancestry tree.
We were programming in a Win32 environment, and used the standard Microsoft DOM library. We would load all the data into memory, parse it into a DOM tree and search, add, modify on the in memory copy. We would periodically save the data, and needed to rotate copies in case the machine crashed in the middle of a write.
We also needed to build up some "indexes" by hand using C++ tree maps. This, of course would be trivial to do with SQL.
Note that the size of the data on the filesystem was a factor of 2-4 smaller than the "in memory" DOM tree.
By the time the data got to 10M-100M size, we started to have real problems. Interestingly enough, at all data sizes, XML processing was much faster than SQLite turned out to be (because it was in memory, not on the hard drive)! The problem was actually twofold- first, loadup time really started to get long. We would need to wait a minute or so before the data was in memory and the maps were built. Of course once loaded the program was very fast. The second problem was that all of this memory was tied up all the time. Systems with only a few hundred meg would be unresponsive in other apps even though we ran very fast.
We actually looking into using a filesystem based XML database. There are a couple open sourced versions XML databases, we tried them. I have never tried to use a commercial XML database, so I can't comment on them. Unfortunately, we could never get the XML databases to work well at all. Even the act of populating the database with hundreds of meg of XML took hours.... Perhaps we were using it incorrectly. Another problem was that these databases were pretty heavyweight. They required Java and had full client server architecture. We gave up on this idea.
We found SQLite then. It solved our problems, but at a price. When we initially plugged SQLite in, the memory and load time problems were gone. Unfortunately, since all processing was now done on the harddrive, the background processing load went way up. While earlier we never even noticed the CPU load, now the processor usage was way up. We needed to optimize the code, and still needed to keep some data in memory. We also needed to rewrite many simple XPath queries as complicated multiquery algorithms.
So here is a summary of what we learned.
For tree data, XML is much easier to query and modify using XPath.
For small datasets (less than 10M), XML blew away SQLite in performance.
For large datasets (greater than 10M-100M), XML load time and memory usage became a big problem, to the point that some computers become unusable.
We couldn't get any opensource XML database to fix the problems associated with large datasets.
SQLite doesn't have the memory problems of XML DOM, but it is generally slower in processing the data (it is on the hard drive, not in memory). (note- SQLite tables can be stored in memory, perhaps this would make it as fast.... We didn't try this because we wanted to get the data out of memory.)
Storing and querying tree data in a table is not enjoyable. However, managing transactions and indexing partially makes up for it.
I basically agree with Mitchel, that this can be highly specific depending on what are you going to do with XML and SQLite. For your case (cache), it seems to me that using SQLite (or other embedded databases) makes more sense.
First I don't really think that SQLite will need more overhead than XML. And I mean both development time overhead and runtime overhead. Only problem is that you have a dependence on SQLite library. But since you would need some library for XML anyway it doesn't matter (I assume project is in C/C++).
Advantages of SQLite over XML:
everything in one file,
performance loss is lower than XML as cache gets bigger,
you can keep feed metadata separate from cache itself (other table), but accessible in the same way,
SQL is probably easier to work with than XPath for most people.
Disadvantages of SQLite:
can be problematic with multiple processes accessing same database (probably not your case),
you should know at least basic SQL. Unless there will be hundreds of thousands of items in cache, I don't think you will need to optimize it much,
maybe in some way it can be more dangerous from security standpoint (SQL injection). On the other hand, you are not coding web app, so this should not happen.
Other things are on par for both solutions probably.
To sum it up, answers to your questions respectively:
You will not know, unless you test your specific application with both back ends. Otherwise it's always just a guess. Basic support for both caches should not be a problem to code. Then benchmark and compare.
Because of the way XML files are organized, SQLite searches should always be faster (barring some corner cases where it doesn't matter anyway because it's blazingly fast). Speeding up searches in XML would require index database anyway, in your case that would mean having cache for cache, not a particularly good idea. But with SQLite you can have indexing as part of database.
Don't forget that you have a great database at your fingertips: the filesystem!
Lots of programmers forget that a decent directory-file structure is/has:
It's fast as hell
It's portable
It has a tiny runtime footprint
People are talking about splitting up XML files into multiple XML files... I would consider splitting your XML into multiple directories and multiple plaintext files.
Give it a go. It's refreshingly fast.
Use XML for data that the
application should know -
configuration, logging and what not.
Use databases(oracle, SQL server etc) for data that the user
interacts with directly or
indirectly - real data
Use SQLite if the user data is more
of a serialized collection - like
huge list of files and their content
or collection of email items etc.
SQLite is good at that.
Depends on the kind and the size of the data.
I wouldn't use XML for storing RSS items. A feed reader makes constant updates as it receives data.
With XML, you need to load the data from file first, parse it, then store it for easy search/retrieval/update. Sounds like a database...
Also, what happens if your application crashes? if you use XML, what state is the data in the XML file versus the data in memory. At least with SQLite you get atomicity, so you are assured that your application will start with the same state as when the last database write was made.
XML is best used as an interchange format when you need to move data from your application to somewhere else or share information between applications. A database should be the preferred method of storage for almost any size application.
When should XML be used for data persistence instead of a database? Almost never. XML is a data transport language. It is slow to parse and awkward to query. Parse the XML (don't shred it!) and convert the resulting data into domain objects. Then persist the domain objects. A major advantage of a database for persistence is SQL which means unstructured queries and access to common tools and optimization techniques.
I have made the switch to SQLite and I feel much better knowing it's in a database.
There are a lot of other benefits from this:
Adding new items is really simple
Sorting by multiple columns
Removing duplicates with a unique index
I've created 2 views, one for unread items and one for all items, not sure if this is the best use of views, but I really wanted to try using them.
I also benchmarked the xml vs sqlite using the StopWatch class, and the sqlite is faster, although it could just be that my way of parsing xml files wasn't the fastest method.
Small # items and size (25 items, 30kb)
~1.5 ms sqlite
~8.0 ms xml
Large # of items (700 items, 350kb)
~20 ms sqlite
~25 ms xml
Large file size (850 items, 1024kb)
~45 ms sqlite
~60 ms xml
To me it really depends on what you are doing with them, how many users/processes need access to them at the same time etc.
I work with large XML files all the time, but they are single process, import style items, that multi-user, or performance are not really needs.
SO really it is a balance.
If any time you will need to scale, use databases.
XML is good for storing data which is not completely structured and you typically want to exchange it with another application. I prefer to use a SQL database for data. XML is error prone as you can cause subtle errors due to typos or ommissions in the data itself. Some open source application frameworks use too many xml files for configuration, data, etc. I prefer to have it in SQL.
Since you ask for a rule of thumb, I would say that use XML based application data, configuration, etc if you are going to set it up once and not access/search it much. For active searches and updations, its best to go with SQL.
For example, a web server stores application data in a XML file and you dont really need to perform complex search, update the file. The web server starts, reads the xml file and thats that. So XML is perfect here. Suppose you use a framework like Struts. You need to use XML and the action configurations dont change much once the application is developed and deployed. So again, the XML file is a good way. Now if your Struts developed application allows extensive searches and updations, deletions, then SQL is the optimal way.
Offcourse, you will surely meet one or two developers in your organisation who will chant XML or SQL only and proclaim XML or SQL as the only way to go. Beware of such folks and do what 'feels' right for your application. Dont just follow a 'technology religion'.
Think of things like how often you need to update the data, how often you need to search the data. Then you will have your answer on what to use - XML or SQL.
I agree with #Bradley.
XML is very slow and not particularly useful as a storage format. Why bother? Will you be editing the data by hand using a text editor? If so, XML still isn't a very convenient format compared to something like YAML. With something like SQlite, queries are easier to write, and there's a well defined API for getting your data in and out.
XML is fine if you need to send data around between programs. But in the name of efficiency, you should probably produce the XML at sending time, and parse it into "real data" at receive time.
All the above means that your question about "when the overhead of a database is justified" is kind of moot. XML has a way higher overhead, all the time, than SQlite does. (Full-on databases like MSSQL are heavier, especially in administrative overhead, but that's a totally different question.)
XML can be stored as text and as a binary file format.
If your primary goal is to let a computer read / write a file format effeciently you should work with a binary file format.
Databases are an easy to use way of storing and maintaining data.
They are not the fastest way to store data that is a binary file format.
What can speed things up is using an in memory database / database type. Sqlite has this option.
And this sounds like the best way to do it for you.
My opinion is that you should use SQLite (or another appropriate embedded database) anytime you don't need a pure-text file format. Note, this is a pretty big exception. There are a lot of scenarios that require, or are benefited by, pure-text file formats.
As far as overhead goes, SQLite compiles to something like 250 k with normal flags. Many XML parsing libraries are larger than SQLite. You get no concurrency gains using XML. The SQLite binary file format is going to support much more efficient writes (largely because you can't append to the end of a well-formatted XML file). And even reading data, most of which I assume is fairly random access, is going to be faster using SQLite.
And to top it all off, you get access to the benefits of SQL like transactions and indexes.
Edit: Forgot to mention. One benefit of SQLite (as opposed to many databases) is that it allows any type in any row in any column. Basically, with SQLite you get the same freedom you have with XML in terms of datatypes. This also means that you don't have to worry about putting limits on text columns.
You should note that many large Relational DBs (Oracle and SQLServer) have XML datatypes to store data within a database and use XPath within the SQL statement to gain access to that data.
Also, there are native XML databases which work very much like SQLite in the sense they are one binary file holding a collection of documents (which could roughly be a table) then you can either XPath/XQuery on a single document or the whole collection. So with an XML database you can do things like store the days data as a separate XML document in the collection... so you just need to use that one document when your dealing with the data for today. But write an XQuery to figure out historical data on the collection of documents for that person. Slick.
I've used Berkeley XMLDB (now backed by Oracle). There are others if you search google for "Native XML Database". I've not seen a performance problem with storing/retrieving data in this manner.
XQuery is a different beast (but well worth learning), however you may be able to just use the XPaths you currently use with slight modifications.
A database is great as part of your program. If quering the data is part of your business logic.
XML is best as a file format, especially if you data format is:
1, Hierarchal
2, Likely to change in the future in ways you can't guess
3, The data is going to live longer than the program
I say it's not a matter of data size, but of data type. If your data is structured, use a relational database. If your data is semi-structured, use XML or - if the data amounts really grow too large - an XML database.
If your searching go with a db. You could split the xml files up into directories to ease seeking, but the managerial overhead easily gets quite heavy. You also get a lot more than just performance with a sql db...

Which is better, using a SQLite database or hardcoding data into a function?

I'm working on a trivia like app and wondering how is the best way to store all of the questions and answers. Right now, I just have a random number and using a whole lot of if statements. For example, if randomNum = 25, then question is THIS and choices are THIS. This seems to work fine, but my file is starting to get very large and this seems like it should cause performance issues. Space is also starting to become an issue. I have started to look into just putting all of the data into database and use a random number to just retrieve a row. Anybody have any suggestions on which would be the best practice or have any other ways of doing this?
Sounds like its a good time to start using the database. You can learn how to include a pre-populated database here.
...using a whole lot of if statements.
I have started to look into just putting all of the data into database and use a random number to just retrieve a row
I think you've kinda answered the question yourself.
What happens with your model if you have 10,000 questions? Are you going to use 10,000 'if' statements?
Even if you're never going to get to that many questions, using a SELECT on a DB where the question number equals a particular random number, is going to be far more extensible.
You should use the database.
It's not just a maintainability and (ultimately) a code simplicity option, either, but offers significant advantages.
Imagine if you want to be able to supply different packs of questions, for example. You could offer people the ability to download a trivia pack from a website, or load it from a file off their SDcard. This simply would not work for masses of if statements.
Suppose you want to let people add their own trivia questions? Upload them to the website for voting and ultimate inclusion into crowd-sourced question packs.
So yeah: you should use a database.

Categories

Resources