Ecto & sqlite3 database lock issues

almarrs · April 15, 2024, 3:27pm

Hi,

I was wondering if anyone else has experienced database lock when using Ecto with sqlite3 via ecto_sqlite3. I have a phoenix app with very light user traffic (at the moment) but fairly consistent data ingest volume hitting the database. After several days of happy processing I’m suddenly getting Repo crashes due to database locks:

GenServer #PID<0.2515.0> terminating
** (MatchError) no match of right hand side value: {:error, "database is locked"}

Looking at the details I have seen the db-wal file has grown quite large (16GB vs 32 GB db file). Using the sqlite3 shell and running PRAGMA wal_checkpoint(TRUNCATE); will sync the database file, reset db-wal to zero and clear the lock letting the phoenix app start again happily.

Is there a way to do this using Ecto?

I have a hunch the lock may be caused by a large delete which I suspect may be taking too long. Though I’ve also experienced this after a sudo systemctl stop app which I think the signal SIGKILL isn’t helping. Am I missing a more graceful shutdown of the Repo in response to the SIGKILL?

At any rate just hoping there may be other’s out there with more experience of this than I currently have!

Thanks

Al

victorbjorklund · April 15, 2024, 4:52pm

Interesting. I never encountered that (my sqlite files probably aren’t big enough). I’m not sure how to address the underlying problem (wether it is in your elixir code or some settings in the sqlite-db) but as a work-around until you find the issue you can run the code in elixir (maybe a genserver that runs once every hour or you can use Quantum for job scheduling Job scheduling in Elixir Phoenix using Quantum | VictorBjorklund.com).

Ecto.Adapters.SQL.query(Repo, "PRAGMA wal_checkpoint(TRUNCATE);")

That will run the command in sqlite.

tj0 · April 15, 2024, 4:58pm

Not sure if this can be done through ecto, but you can query any sql statements through exsqlite directly. Ecto.Adapters.SQL — Ecto SQL v3.11.1

However, if you want to spend the time, it is probably best to understand the query pattern causing the lock better before messing around with the wal files. If it is a delete causing this, are you wrapping the ingest in a begin transaction if it’s a bulk insert? Or are there lots of tiny inserts?

Edit: @victorbjorklund beat me to it.

almarrs · April 16, 2024, 8:42am

I’m tracking state for a large >200,000 set of entities, building an in memory summary of state history over the course of an hour within a GenServer then serializing the entity state to the sqlite db every hour to reduce memory usage within the GenServer. Then once a day (at midnight) all the hourly state segments get merged into a daily summary which is stored in a separate db table then the hourly summaries are deleted and we go again for the following day.

My hunch is the deletion of the hourly summary rows (which can run to millions of entries) is taking a long time causing the wal log to grow and leaving the database in a locked state if there’s a crash. I’m not familiar enough with the internals of sqlite to say why a restarted app would view the db as locked already.

I may separate the hourly summary data into it’s own db file and just delete the file each day to see if that solves the problem for me.

dimitarvp · April 16, 2024, 9:34am

As a first measure you can also use the recently added wal2 mode – that adds one more layer of a transaction log.

I don’t know your business requirements but accumulating 1 hour of changes seems like a use case that’s not optimized for by the creators of SQLite. Why not add one more layer below and make 5 minute aggregations, then the 1h and the 24h ones?

AnthonyMujic · April 16, 2024, 12:32pm

I would try increasing the ecto_sqlite3 :busy_timeout option. It’s default is 2 seconds which may not be enough if you are doing a lot of writes all on the hour. I would also consider doing the daily delete at 00:30:00 instead of on the hour as well to reduce congestion.
I’m no expert about sqlite locking, but I assumed the locking mechanism would be written to the db file on disk, to allow different processes accessing the database to know if a table is locked or not. If so, and your delete had locked the db, it would remain locked after an application restart.

BartOtten · April 16, 2024, 1:00pm

I would always start by tuning the database. SQLite is powerful but by default the settings are optimized for low-spec (embedded) devices.

The biggest problem with SQLite: The SQLITE_BUSY error

SQLITE_BUSY (or database is locked, depending on your programming language) is certainly the error that you will encounter the most often when starting to use SQLite with servers, so it’s important to understand its meaning and why it happens.

https://kerkour.com/sqlite-for-servers

tj0 · April 16, 2024, 3:52pm

Maybe use drop instead of delete for the partitioning?

partition the data into 24 tables for the hourly summary (hourly_0, hourly_1, hourly_23)
each hour drop and recreate the hourly table. sqlite doesn’t have a truncate, so the only option is to drop/create.

You could also just create an hourly table with a parseable date (hourly_2024_01_01__14) and drop the oldest ones. You may also need to run vacuum.

Ecto & sqlite3 database lock issues

The biggest problem with SQLite: The SQLITE_BUSY error

The biggest problem with SQLite: The `SQLITE_BUSY` error