The Android SDK documentation for SQLite provides an update method which takes as its parameters four values - table, values, whereClause, whereArgs. The first three make complete sense. However, it is not clear to me that using whereArgs with a whereClause containing ?'s as opposed to sending out a fully prepared whereClause offers any benefits - either in terms of security (there is no suggestion that this somehow helps to sanitize the SQL) or speed. So what then are the benefifts of going down that route instead of simply passing a full where string and a null whereArgs?
The docs say:
String: You may include ?s in the where clause, which will be replaced by the values from whereArgs. The values will be bound as Strings.
This is slightly misleading. No "replacement" takes place actually. Instead the ?s are variables and the whereArgs are values that are bound to those variables, and this binding happens inside the sqlite SQL program.
Using variable binding avoids issues such as SQL injection without the need to sanitize inputs.
Similar mechanism would be beneficial for performance in case you were executing the same SQL program over and over again with different values for variables. You only need to compile the SQL once. Android SQLite mechanism for that is SQLiteStatement (see the bind...() methods in its SQLiteProgram superclass).
Security is definitely an issue. If you use string concatenation, you are vulnerable to SQL Injection. Using ? and whereArgs does indeed sanitize the input so you are safe.
There is also the case of prepared statements - you compile them only once and then bind different values for each arguments placeholder. This will give you a benefit in terms of performance. You can't get that with your approach.
Related
I am creating a notes app. The user can input a note, and it will get saved to the database/displayed on the screen.
I have a DatabaseHelper class which includes all the CRUD methods(Create, Read, Update, Delete).
When I update a specific note in a table, I have to do it like this:
db.update("Note", contentValues, "id='"+id+"'", null)
with '' surrounding the user's id.
However, when I am querying to read a note, I don't have to include the '':
String fetchOneNote = "SELECT * FROM Note WHERE id="+id;
Is there a specific reason for this? It seems like in both, I am referring to the database, so why do I need the ''?
Thanks!
There is no need to enclose a numeric literal in single quotes but single quotes are required for a string literal.
So IF id is numeric there is no need to enclose it in single quotes. However, it doesn't hurt to enclose a numeric literal in quotes.
As such ASSUMING that id is numeric then:-
db.update("Note",contentValues,"id=" + id,null)
will effectively work the same as :-
db.update("Note", contentValues, "id='"+id+"'", null)
However, the recommended use of the SQLiteDatabase update method is to utilise the 4th parameter for the where clause parameters which protects against SQLite injection. As such it would be better to use :-
db.update("Note",contentValues,"id=?",new String[]{id});
The SQLite parser then handles the id appropriately, replacing the ? with the value and protects against SQLite injection.
See https://sqlite.org/lang_expr.html#literal_values_constants_ and also https://sqlite.org/lang_expr.html#parameters
Although the explanation regarding binding parameters includes:-
But because it is easy to miscount the question marks, the use of this parameter format is discouraged. Programmers are encouraged to use one of the symbolic formats below or the ?NNN format above instead.
The ? is commonly used. This is what the update method (and other methods) expect.
It does mean that on occasions you may have to code the same parameterised (bound) parameter twice along with ?'s (as you would if not using bound parameters).
My App data managed by the Content Provider using CursorLoaders is in SQLite database.
According to Veracode Static Scan report , it is prone to SQL Injections.
But according to docs,
To avoid this problem, use a selection clause that uses ? as a replaceable parameter and a separate array of selection arguments. When you do this, the user input is bound directly to the query rather than being interpreted as part of an SQL statement. Because it's not treated as SQL, the user input can't inject malicious SQL.
public Loader<Cursor> onCreateLoader(int id, Bundle b) {
return new CursorLoader(getActivity(),
NewsFeedTable.CONTENT_URI,
NewsFeedTable.PROJECTION,
"_id = ?",
new String[]{tid},
null);
}
As shown in above code, I am doing in similar way.
Also I read same in The Mobile Application Hacker's Book
If this is not sufficient measure to prevent SQL injections, how do I sanitize the sql query from the special characters?
Every read suggests using parameterized PreparedStatements.
Is it not default with Content Providers?
An alternative to SQLiteStatement is to use the query, insert, update, and delete methods on SQLiteDatabase as they offer parameterized statements via their use of string arrays.
I found this as a solution :
But then I read docs from here that
StringEscapeUtils.escapeSql
This was a misleading method, only handling the simplest of possible SQL cases. As SQL is not Lang's focus, it didn't make sense to maintain this method.
Adding the code snippet. Report points at Line 307 where SQL Injection flaw is detected:
How should I do input validation for the special characters?
Please help, to make me understand it better.
Values in selectionArgs parameters do not need to be escaped, and they must not be escaped because the escape characters would end up in the database.
There are three different cases of SQL code seen by Veracode:
values that cannot be user input (such as string literals in the source code);
values that are user input (because the come directly from, e.g., some edit box);
values that might be user input, because the tool cannot determine the source.
For marketing reasons, paid-for tools tend to inflate the problem numbers as much as possible. So Veracode reports all instances of the third case as problems.
In this case, Veracode does not know where selection comes from, so it complains. If that value is constructed by your program and never contains any user input (i.e., all user-input values are moved to ? parameters), then this is a false positive, and you must tell Veracode to shut up.
Checking the rawQuery(sql, selectionArgs) docs, I don't see any explicit statement about selectionArgs escaping.
So should I sanitize the selectionArgs ahead, or is it safe to assume the rawQuery will use any selectionArgs values in safe way, avoiding SQL Injection?
I'm interested into Android API 14+ implementations, I don't care about any security bugs in old 1.x and 2.x ROMs.
(I'm not talking about the first String sql parameter, that one looks obviously vulnerable, but I would expect the args to be sanitized, yet there's no documentation about it on developer.android.com)
Selection args use sqlite's variable binding and they are never part of the SQL itself. No sanitization is needed.
The Android documentation is actually misleading here:
You may include ?s in where clause in the query, which will be replaced by the values from selectionArgs. The values will be bound as Strings.
The first sentence would indicate some sort of string replacement happening while there is no string replacement going on actually. The latter sentence refers to (variable) binding which is correct, and for which there's e.g. SQLiteProgram and its bindAllArgsAsStrings() between SQLiteDatabase and the sqlite C API.
I would like to use SQLiteStatement in my ContentProvider instead of the rawQuery or one of the other standard methods. I think using SQLiteStatement would give a more natural, native, efficient and less error prone approach to doing queries.
The problem is that I don't see a way to generate and return a Cursor. I realize I can use "call" and return a Bundle, but that approach requires that I cache and return all selected rows at the same time - this could be huge.
I will start looking at Android source code - I presume that "query" ultimately uses SQLiteStatement and somehow generates a Cursor. However, if anyone has any pointers or knowledge of this, I would greatly appreciate your sharing.
I would like to use SQLiteStatement in my ContentProvider instead of the rawQuery or one of the other standard methods. I think using SQLiteStatement would give a more natural, native, efficient and less error prone approach to doing queries.
Quoting the documentation for SQLiteStatement:
The statement cannot return multiple rows or columns, but single value (1 x 1) result sets are supported.
I fail to see why you would bother with a ContentProvider for single row, single column results, but, hey, it's your app...
The problem is that I don't see a way to generate and return a Cursor
Create a MatrixCursor and fill in the single result.
We have about 7-8 tables in our Android application each having about 8 columns on an average. Both read and write operations are performed on the database and I am experimenting and trying to find ways to enhance the performance of the DataAccess layer. So, far I have tried the following:
Use positional arguments in where clauses (Reason: so that sqlite makes use of the same execution plan)
Enclose inserts and update with transactions(Reason: every db operation is enclosed within a transaction by default. Doing this will remove that overhead)
Indexing: I have not created any explicit index other than those created by default on the primary key and unique keys columns.(Reason: indexing will improve seek time)
I have mentioned my assumptions in paranthesis; please correct me if I am wrong.
Questions:
Can I add anything else to this list? I read somewhere that avoiding the use of db-journal can improve performance of updates? Is this a myth or fact? How can this be done, if recomended?
Are nested transactions allowed in SQLite3? How do they affect performance?
The thing is I have a function which runs an update in a loop, so, i have enclosed the loop within a transaction block. Sometimes this function is called from another loop inside some other function. The calling function also encloses the loop within a transaction block. How does such a nesting of transactions affect performance?
The where clauses on my queries use more than one columns to build the predicate. These columns might not necessarily by a primary key or unique columns. Should I create indices on these columns too? Is it a good idea to create multiple indices for such a table?
Pin down exactly which queries you need to optimize. Grab a copy of a typical database and use the REPL to time queries. Use this to benchmark any gains as you optimize.
Use ANALYZE to allow SQLite's query planner to work more efficiently.
For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up. Use EXPLAIN QUERY PLAN on your queries to see which index would be used or if the query requires a full table scan. For large tables, a full table scan is bad and you probably want an index. Only one index will be used on any given query. If you have multiple predicates, then the index that will be used is the one that is expected to reduce the result set the most (based on ANALYZE). You can have indexes that contain multiple columns (to assist queries with multiple predicates). If you have indexes with multiple columns, they are usable only if the predicates fit the index from left to right with no gaps (but unused columns at the end are fine). If you use an ordering predicate (<, <=, > etc) then that needs to be in the last used column of the index. Using both WHERE predicates and ORDER BY both require an index and SQLite can only use one, so that can be a point where performance suffers. The more indexes you have, the slower your INSERTs will be, so you will have to work out the best trade-off for your situation.
If you have more complex queries that can't make use of any indexes that you might create, you can de-normalize your schema, structuring your data in such a way that the queries are simpler and can be answered using indexes.
If you are doing a large number of INSERTs, try dropping indexes and recreating them at the end. You will need to benchmark this.
SQLite does support nested transactions using savepoints, but I'm not sure that you'll gain anything there performance-wise.
You can gain lots of speed by compromising on data integrity. If you can recover from database corruption yourself, then this might work for you. You could perhaps only do this when you're doing intensive operations that you can recover from manually.
I'm not sure how much of this you can get to from an Android application. There is a more detailed guide for optimizing SQLite in general in the SQLite documentation.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app. I'm starting with an SQLiteOpenHelper dbHelper and an SQLiteQueryBuilder qb.
String sql = qb.buildQuery(projection,selection,selectionArgs,groupBy,having,sortOrder,limit);
android.util.Log.d("EXPLAIN",sql + "; " + java.util.Arrays.toString(selectionArgs));
Cursor c = dbHelper.getReadableDatabase().rawQuery("EXPLAIN QUERY PLAN " + sql,selectionArgs);
if(c.moveToFirst()) {
do {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < c.getColumnCount(); i++) {
sb.append(c.getColumnName(i)).append(":").append(c.getString(i)).append(", ");
}
android.util.Log.d("EXPLAIN",sb.toString());
} while(c.moveToNext());
}
c.close();
I dropped this into my ContentProvider.query() and now I can see exactly how all the queries are getting performed. (In my case it looks like the problem is too many queries rather than poor use of indexing; but maybe this will help someone else...)
I would add these :
Using of rawQuery() instead of building using ContentValues will fasten up in certain cases. off course it is a little tedious to write raw query.
If you have a lot of string / text type data, consider creating Virtual tables using full text search (FTS3), which can run faster query. you can search in google for the exact speed improvements.
A minor point to add to Robie's otherwise comprehensive answer: the VFS in SQLite (which is mostly concerned with locking) can be swapped out for alternatives. You may find one of the alternatives like unix-excl or unix-none to be faster but heed the warnings on the SQLite VFS page!
Normalization (of table structures) is also worth considering (if you haven't already) simply because it tends to provide the smallest representation of the data in the database; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases (the sort I'm most familiar with), but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.