Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Assume I have a database set up and a table named MyTable which contains a large number of records (tens of thousands). Assume a query as follows...
select * from MyTable where ColumnX = 'X'
... returns just a handful of records (< 10). Let's say I wanted to filter this result set further to only those records where ColumnY matches 'Y1' or 'Y2'. Is it better from a speed and memory perspective to simply modify to the above query as follows...
select * from MyTable where ColumnX = 'X' and (ColumnY = 'Y1' or ColumnY = 'Y2')
... Or is it better to iterate over the (small) result set in code and filter out only those records where ColumnY matches 'Y1' or 'Y2'? The reason I ask is because I have been told that OR clauses are bad in database queries from a performance perspective (when dealing with large tables) and better avoided where possible.
Note: The scenario in which this applies for me is an Android application with a local SQLite database but I guess the question is a bit more generic than that.
SQLite's documentation describes multiple optimizations that can be done on queries with OR, and says:
For any given query, the fact that the OR-clause optimization described here can be used does not guarantee that it will be used. SQLite uses a cost-based query planner that estimates the CPU and disk I/O costs of various competing query plans and chooses the plan that it thinks will be the fastest. If there are many OR terms in the WHERE clause or if some of the indices on individual OR-clause subterms are not very selective, then SQLite might decide that it is faster to use a different query algorithm, or even a full-table scan. Application developers can use the EXPLAIN QUERY PLAN prefix on a statement to get a high-level overview of the chosen query strategy.
In any case, implementing the OR by hand in your code is very likely to be slower than letting the database do it, because the database has to read and return all rows that match on ColumnX, even those that will not match on ColumnY.
Furthermore, the database already has code to do this filtering; implementing it again just increases the complexity of your code and the chances of errors.
The statement that "OR clauses are bad in database queries from a performance perspective (when dealing with large tables) and better avoided where possible" is not quite true; if you need the OR, all alternatives are worse.
You can try with IN clause :
select * from MyTable where ColumnX = 'X' and ColumnY in ('Y1','Y2')
Yes, Ajay you are right, thank you .
Another way to solve this will be to either use temp table, or With clause.
-- Following solution for Oracle , need little change for each db product to replace DUAL
with YT (columnY)( select 'Y1' as columnY from DUAL
union
select 'Y2' from DUAL)
select MT.*
from MyTable as MT
, YT
where MT.ColumnX = 'X'
and MT.ColumnY = YT.columnY
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a database on SQL Server which contain almost 40 tables. These tables have Primary, Foreign, and Composite keys in them. Other than that on execution time I have used multiple complex Queries which uses Joins and Functions and other SQL parameters. I want to know that all the queries which are applicable on SQL Server are also applicable on SQLite running on Android? Thanks
I want to know that all the queries which are applicable on SQL server are also applicable on SQLite running on android?
I'd suggest that the best way to ascertain this is to test using one of the available SQLite tools (e.g. DBBeaver, Navicat for SQLite).
JOINS
As an example of incompatibility there is no RIGHT JOIN all joins are LEFT as such in SQLite so you would need to reverse a RIGHT JOIN.
as per https://sqlite.org/syntax/join-operator.html
Column Types
In some other aspects SQLite is more than hospitable, such as column types, with one exception a row/column can store any of the supported types (NULL, INTEGER, TEXT, REAL, BLOB, NUMERIC) and you can actually specific virtually any column type e.g. any_old_column_type is valid as a column type.
NUMERIC is a catchall in that a column type that is not determined as any of the other types via some basic rules is numeric any_old_column_type would have a type affinity of NUMERIC as an example. Not that type affinity is very frequently a matter of concern.
example here at http://sqlfiddle.com/#!7/9eecb7/8070
I believe SQL Server has a DATE type SQLite doesn't but can handle dates stored as integers, strings, real or numeric
see https://sqlite.org/lang_datefunc.html
You may have to be wary of nulls as you should never use = null (as no null is the same as another null) rather you would use IS null or IS NOT null. (not sure about SQL Server).
You may wish to consider https://www.sqlite.org/datatype3.html
Auto Incrementing
Yet another difference that you may well encounter is with auto incrementing identifiers.
In short in SQLite a column definition that is explicitly or implicitly (BUT EXACTLY INTEGER and not INT or PINT (which would have INTEGER affinity) ....):- column_name INTEGER PRIMARY KEY, will increment typically monotonically (and with a tweak (insert - number) can roll through to negatives (INTEGER is 64bit signed)) BUT there is no guarantee about it being monotonically.
Often the keyword AUTOINCREMENT is assumed to define this action. However, what AUTOINCREMENT does is guarantee that the value will be greater than any used (or fail with SQLITE FULL error should the ID exceed 9223372036854775807, without AUTOINCREMENT a lower "free" id may be returned (very likely as at 9223372036854775807 id's it is unlikely that the data could be stored)).
the negative tweak cannot be used for AUTOINCREMENT
ROWID There is (unless the WITHOUT ROWID) always such a column, but it is normally hidden, it can be referred to as rowid, rowid or oid. column_name INTEGER PRIMARY KEY (with or without AUTOINCREMENT) makes the column an alias of the rowid.
see https://sqlite.org/autoinc.html
perhaps note that AUTOINCREMENT is not recommended unless necessary as it has overheads (a table sqlite_sequence where the highest provided id is stored according to the table's name).
Android Version
Another consideration is that different versions of Android have different version of SQLite and therefore the lowest level of Android targeted may determine what SQL can or can't be utilised.
One such concern could be Windows function that use the OVER keyword. Only Android API 30+ has such functionality.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
My App deals with several similar datasets. That is, they are stored in the same tables, but different data. The user may create more datasets. In any case, these datasets are guaranteed to be disjunct. There will never be any data in one dataset linked somehow to data in another dataset.
I was wondering, would it be better to have a dedicated database for each dataset instead of having all the data in one big database?
I would expect lookup times to improve, if the user works on a smaller database. Is there a rule of thumb, how many entries a database (or table) can hold before I should worry about lookup times?
One drawback I can think of is that opening a database creates some overhead. However, I don't expect the user to switch datasets frequently.
Consider this example:
The database contains tables for companies, clients, products and orders. Companies never share clients or products, thus companies are the disjunct datasets. However, all products, clients and orders are in just one big table (for each, respectively).
Queries to the database might include:
All orders for a particular client.
All products a particular client has ordered.
All clients who have ordered a particular product.
etc.
These queries have in common, that they will always be issued in the context of one single company. Yet since the database doesn't know about this logical partition, all clients, products and orders will be searched.
If I were to have several databases, for each company one, my logical partition would be reflected and only the relevant data would be searched. I'm not sure of the overhead of having that many databases though.
Since I'm new to database schema design, I want to throw this idea out there to see, if several databases really are a good idea or not.
Update:
In case this wasn't clear: the database will be on the Android Phone, not in the Cloud or something.
There's no rule of thumb. AFAIK the look-up time doesn't purely depend on number of entries. It depends on several factors such as but not limited to -
how fat the table is
table indexes
how the data is stored e.g. boolean true/false or string YES/NO in the table having 3 million records
hardware size
primary key/foreign key relationship (sort of connected to point 1 above)
As a general approach, one database theory is advisable. The servers nowadays are quite powerful and there are multiple options when it comes to handling the performance optimisation such as -
cloud databases which give the flexibility to choose the size
BigData
In-memory databases
Analysis services such as SSAS
NoSQL databases which are horizontally scalable e.g. FireStore
Now, the biggest benefit of using one database is - your development and testing will be quick. What does that mean ? Let's say you need to add/delete/modify one field in one table. Now, if you have 10 different databases then you will need to do the exact same change at 10 different places and then test it as well. If the changes are frequent then you might end-up in writing a generic script. And there is always a chance that this script might break e.g. database change, patch update blah blah. However, in the case of one database, the efforts are straight away 1/10th. Another benefit is database administration/monitoring will be easy e.g. adding indexes.
I had a similar requirement few months back wherein I've a similar application (mobile+web). The set-up is similar. Different companies access the data. And the user from a particular company is allowed to view data pertaining to his/her company. All I've done is to add one more column assigned as ORGCODE in almost every table. More than 12 clients are happily sharing the tables without any issues.
Disclaimer: All of the above is quite generic without knowing your use-case and performance requirement.
Your question reminds me of some articles out there discussing the difference between relational databases and storing data as json or other noSQL options. Without doing some studies on what you are trying to accomplish and the scale that you might get to it is hard to judge. However, from a maintenance perspective, your database schema and its flexibility to change would favor the single db instance. You might go with multiple tables as well.
Well this is question of pure performance. You should know how big should be your database and how much bigger will it be with all the data you ought to store in separate database - if this amount is around 20% of the general database and it will be only decreasing - use one database, if it may increase to allocate 50% or more of the general database - you may consider separate ones.
General size of database also matters. Modern devices may relatively comfortably work with databases up to 500mb(~500 000 heavy lines). It will handle more but it will require some modifications of UX and UI and scheme in order to minimize calls(pagination, indexes etc.). Although if you will run such an application of some weak device it will crush.
Also knowing how SQLite works(virtual tables in RAM) it is highly dependent on RAM amount accessible by an app. It is best to use db sized up to 100mb.
As you can see there is no single approach - you have to choose based on your app usecases and predicted size of database.
Hope this answer helps you somehow.
I would go for one database - Less maintenance and stuff that can go wrong.
Make sure its optimized and indexed
The wording of my question comes from a comment at the end of the blog post Android Quick Tip: Using SQLite FTS Tables. As the title implies, the post tells how to create and query full text search virual tables in your android app. The comment by user Fer Raviola specifically reads
my question is why dont' we ALWAYS use FTS tables!. I mean, they ARE
faster
The blog author did not reply (at the time of this writing anyway), but I thought it was an interesting question that deserves an answer. After all, FTS tables can be made for an entire table, not just a specific text column. At first look, it seems like it would both simply and speed up queries.
One could also completely do away with the non-virtual table. That would eliminate having to keep the virtual and non-virtual tables in sync with triggers and external content tables. All the data would be stored in the virtual table.
#CL. says that this is not a good option, though, because "FTS tables cannot be efficiently queried for non-FTS searches." I assume that this has something to do with what the SQLite documentation says here:
-- The examples in this block assume the following FTS table:
CREATE VIRTUAL TABLE mail USING fts3(subject, body);
SELECT * FROM mail WHERE rowid = 15; -- Fast. Rowid lookup.
SELECT * FROM mail WHERE body MATCH 'sqlite'; -- Fast. Full-text query.
SELECT * FROM mail WHERE mail MATCH 'search'; -- Fast. Full-text query.
SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20; -- Slow. Linear scan.
SELECT * FROM mail WHERE subject = 'database'; -- Slow. Linear scan.
SELECT * FROM mail WHERE subject MATCH 'database'; -- Fast. Full-text query.
But are the slow queries really so much slower than if one were just doing a normal query on a normal table? If so, why?
Here are some general potential downsides that I can think of to only using a virtual FTS table in Android:
The table would be larger because of the size that the index takes.
Operations like INSERT, UPDATE, and DELETE would be slower because the index would have to be updated.
But as far as queries themselves go, I don't see what the problem would be.
Update
The Android documentation example Storing and Searching for Data uses only an FTS virtual table in its database. This seems to confirm that there are at least some viable options for FTS only databases.
When the table is small, scanning all rows does not take much time.
For large tables, however, this can take a very long time. (The speed would be similar to a normal, unindexed table.)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am developing a constitution app for Android. I want to know what is the best way to store the large text data. The features I will want is for it to be searchable and fast. If am to use sqlite3, how am supposed to create the file and where do I upload it into the app, if am to use XML to store it, will it be searchable? Speed will matter.
1) sqlite can create FTS (Full Text Search) tables. Basically it's a table with a builtin index on every word. It allows you to find data with queries like
SELECT * FROM chapter WHERE chapter MATCH 'someword'
or
SELECT * FROM chapter WHERE chapter MATCH 'someword NEAR/6 otherword'
meaning 'where someword and otherword are separated by less than 6 words'.
2) As for how to include the database in you application, I think you have 2 options :
Build the database beforehand (e.g. using Python, which has builtin sqlite bindings) and include it in your application's assets.
Include text file(s) assets and create the database on first launch.
You can separate the text file in lines and store the lines in a sqlite database.
You can also add data related to the lines organizing them in chapters, associating related topics, tags, etc...
Then you can query the text by "full scanning" the text table. "select line_number from table where line like "%word%";
Based on the lines meeting the search, you could bring up the corresponding page or chapter.
This is a simple example, you can build your app to generate more fine tuned queries.
But this approach should have a better cost/benefit ratio considering the search time and what to do with what you found after searching.
I'm planning on generating queries for SQLite that will involve many joins on 12 tables that will surpass the 64 table join limit in SQLite. (~250 table joins or possibly more) This will be running on android eventually. The purpose behind this is to have X amount of user defined fields in the result set depending on the report that is being generated.
Unfortunately I'm not a DBA and I do not know of an optimal way to achieve this.
So far I think the options are:
Use 2 temp tables to juggle the result set while joining the max amount possible. (My previous solution in SQLServer, fairly slow)
Produce result sets of a few columns and a key to join on and store them in n temp tables. (Where n is less than 64) Then join all the temp tables on their common key.
Create a single temp table and fill it up one insert or update at a time.
Don't do a big join, perform many selects instead and fill up some sort of data container.
Is there something else I should consider?
Per your comment on Mike's response, "the query to generate the report needs to join and rejoin many many times".
Frequently, when dealing with reports, you'll want to split your query into bite-size chunks, and store intermediary results in temporary tables where applicable.
Also, your question makes it sound like you've an entity/attribute/value store and trying to pivot the whole thing. If so, you may want to revisit using this design anti-pattern, since it probably is at the source of your problem.
I don't think you can get "fast" on any relational database platform when you're trying to join that many tables - any kind of built-in optimisation is going to give up the ghost. I would be likely to review my design when I saw as many as ten tables in a query.
I think your schema design needs to be revisited. 250+ tables in a schema (on a phone!) doesn't make sense to me - I run several enterprise apps in a single DB with 200+GB of data and there are still only 84 tables. And I never join all of them. Do all your tables have different columns? Really different? Could you post a few entries from sqlite_master?
Since your app is running on an Android device, I would guess it syncs with an enterprise-class database on a server somewhere. The real solution is to generate a de-normalized representation of the server data on the device database, so it can be more readily accessed.