Our current iPhone application makes quite heavy use of SQLite. We persist the model in the database using a Data Access Object pattern built on top of my own JDBC like SQLite layer and it is all still quite lightweight. CoreData whould have been prefereable but we have to be compatible with iPhone SDK versions prior to 3.0. We also use the database to rank trends using indexed and weighted keywords and some additional temporal factors. Thankfully the text data we search through on the device is pre-indexed so we don't have to worry too much about getting suitable data into the tables. However we do need the database to run queries against this data and we spotted early on that optimization would be beneficial here.
I had some ideas of where we could improve and with a little Googling I found a sparse but useful SQLite optimization FAQ compiled by Jim Lyon. I quickly put together a hit list of optimizations that I'd try.
Use a between construct instead of LIKE
The FAQ explains that a LIKE cannot use an index and thus where possible it should be replaced with the > and < operators. For example:
word LIKE 'dog%'
Can be replaced with the more efficient:
word > 'dog' AND word < 'doh'
It's not a trivial replacement. Firstly the strings you are comparing must be of one case only. I also ran into trouble with international characters which of course don't fit nicely into the byte ranges. Furthermore you must write a little logic to generate the last character(s) of the 'upper limit term' (in this case 'dog'). Incrementing the last character is all well and good - but what about when you want to perform:
LIKE 'oz%'
I simply appended a low value character to the string to obtain the 'upper limit term' and ended up with something like: 'oz!' Thankfully I could work within these limitations for our use-cases and performance was much improved.
Move members of IN clauses into temporary tables
Many of our queries used variable length IN clauses. This made it unfeasible to prepare and cache the resultant statements and they would be prepared fresh each time. This sucked up and incredible amount of time - we might spend a second just preparing the statement. A typical clause is shown here:
AND id IN (32, 45, 67, 68, 80)
I decided that if I moved these values into a temporary table I could use a sub-select within the IN clause and hence end up with a static statement that I could prepare once and cache:
AND id IN (SELECT id FROM temporary_ids)
In addition to this I hoped to use a PRAGMA directive described in the FAQ to move the temporary tables off of the flash disk and into memory:
PRAGMA temp_store MEMORY
However, this setting does not seem to take effect on the iPhone version of SQLite which was somewhat disappointing. That said, the restructuring of my IN clauses did provided yet another significant performance improvement. I wouldn't be surprised if the row inserts into the temp table actually take longer than the execution of a given IN clause. But in this instance I am avoiding the costly preparation of a statement on each call.
Order sub-queries so that smaller results are returned first
The FAQ suggests in section 5.3 that sub-queries or criteria should appear in an order such that the criteria that will exclude the most rows should appear first. I take this to mean that this (poor) example query:
SELECT o.id
FROM owner o, pet p
WHERE o.age > 12 AND p.name = 'nathan' AND p.id = o.pet_id
Should be rewritten as:
SELECT o.id
FROM owner o, pet p
WHERE p.name = 'nathan' AND o.age > 12 AND p.id = o.pet_id
Because the p.name criteria is far more selective than that using o.age. Okay, so it's not a great example query. However, in our queries it was quite clear which criteria would be the most selective.
Other practices
Prior to implementing these optimizations we were using many of the best practices recommended in the FAQ including:
- Batching commands into transactions
- Using indexes where appropriate and justifying the indexes with the EXPLAIN command
- VACUUMing the database file before making release builds - this had a noticeable effect on the database file size but I couldn't say that it improved the performance.