Sqlite - Blunders

Jun 5, 2021 4 min read Tech, SqliteDB

What is Sqlite:

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day. More Info: SQLite

My Learning:

Sqlite is a fantastic embedded database - light weight, easy to use and almost Zero Configuration. I would prefer to call it ZMQ of DataBase 😉

Blunder:

As part of our design: we had to persist a bunch of parametric information per user session (few MBs). This information is not required outside the user session. But, it would be reused when the user comes back. Session - is local running on a windows box. We decided to use sqlite as our backend persistence mechanism; as there was no installation or specific configuration required to set sqlite up.

Since there are parallel read write happening, we had to use the WAL: Write Ahead Logging journal mode.

We did couple of blunders:

Blunder #1 - Insert ONE at a time

When an action occured - we opened the db connection, inserted record and closed it.
We didn’t want to keep the connection open, throughout the session - as there were chances of software crash - which could potentially corrupt he database.
But as time went by, the number of actions became HUGE (order of millions - automated). At this point - Opening and closing the connection for every action, started to hurt us (high latency and low throughtput). Coz, the time taken to open and close a connection (~300ms) was adding up.

Solution: Microbatching: Batched about 500 actions together - open the connection once, inserted all 500 actions in one shot and closed - Transaction. Worst case, we would loose less than 500 actions, and that tradeoff was acceptable.

Blunder #2 - Mounted Drive:

The db had to be persisted on a mounted Linux drive (there are reason for this, I won’t go into details).
As the session started - the app created a sqliteDB directly on the mounted drive and started inserting.
Since we were using WAL mode - there were temporary files getting created and deleted on the mounted drive during every transaction.
As time went by - the reliablity of the drive/driver went down, and retries didn’t help keep up. The throughput significantly got impacted and multiple errors started showing up.

alt Components

Solution: During the session, create the db in a local (temp) drive, and end of sesssion copy it into the mounted drive. Similarly, when session restarts - copy from mounted drive on to a new temp location and consume it; and copy it back on completion.

alt Components

Never EVER - read or write an Sqlite Database from a mounted Drive. It usually works, but when it doesn’t - its a nightmare.

Blunder #3 - Lazy Load:

When a specific information was required - we opened the db connection, executed the “select” query and return the results.
Again: We didn’t want to keep the connection open, throughout the session - as there were chances of software crash - which could potentially corrupt he database.
The read was implemented in a “Lazy - On demand” fashion - so that only the minimum required information was read on demand.
Reading only the required info is an ideal design - why would we ever read more that what is required? GraphQL - is a good example of how to design minimilistic API.
But the challenge in our (specific) case: was the number of reads - it was about 600reads in less than 50 milliseconds. And each one had to open and close the connection (which takes ~50ms).
Since it was a high throughput system, the overhead of lazy read was slowing down the entire software.

Solution: PreFetch Require Information

Since we knew the sequence in which the actions are performed, we redesigned the query to read “more” records than required.
Then created an in memory cache [index/map] to hold the data and filter based on Client Query.
Read from DB would occur only if there is a cache miss.
The cache was a rolling buffer with LRU (Least Recently Used) and a watermark (memory or count), so that the RAM didn’t explode.
When a new read from DB was required, if the cache is full - we delete the oldest item and read from DB and add it to cache.
No change to the Client API. Only the Data Acess Layer changed. 😉

Conclusion:

As always on hindsight, the solution looks trivial and obvious. Since, we were using Sqlite for the first time and initially didn’t reallize the data could grow exponentially - had to learn (Adapt) the hard way. Esp, when these are running in production - you need scheme’s to manage old vs new verisons of software (read, backward compatibility).

Tech SqliteDB