siiky

Help with SQLite reliability (idle time & timeout errors)

(TLDR at the end)

I spent the past two days trying to optimize one of the SQLite DBs and operations on them for what would be a rather extreme scenario for us, though not impossible if we assume bugs (my thinking is if it works under these conditions, prod will be a breeze). Unfortunately my efforts resulted in little to no success…

I searched the forum and found a few good posts and articles (e.g., Ecto & sqlite3 database lock issues - #2 by victorbjorklund, and Optimizing SQLite for servers from Ecto & sqlite3 database lock issues - #7 by BartOtten). I’ve skimmed the SQLite docs (especially the PRAGMAs) plus the ecto_sqlite3 and exqlite docs too. I’ve tried a few different things (configs, DB access patterns, indexes, …), but I can’t get much more out of the DB.

So I’ve resigned from that battle for now.

However, there’s still two things bothering me. One is that the idle time is sometimes unreasonably high (e.g. 1608.5ms vs 12.9ms db time). The other is that when a given statement takes more time than the :timeout, an error is logged (suggesting the execution was canceled), but it actually continues through to the end and even returns, because no error is raised. Here are logs showing this (used timeout: 100 to force the error):

# First DB call
[info] MachineManager synchronizing... Current version=0, target=999992
[error] Exqlite.Connection (#PID<0.400.0> ("db_conn_2")) disconnected: ** (DBConnection.ConnectionError) client #PID<0.415.0> (Foo.Office.MachineAggregator) timed out because it queued and checked out the connection for longer than 100ms

#PID<0.415.0> (Foo.Office.MachineAggregator) was at location:

    (exqlite 0.35.0) lib/exqlite/sqlite3.ex:274: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.35.0) lib/exqlite/sqlite3.ex:312: Exqlite.Sqlite3.try_fetch_all/3
    (exqlite 0.35.0) lib/exqlite/sqlite3.ex:306: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.35.0) lib/exqlite/connection.ex:695: Exqlite.Connection.get_rows/2
    (exqlite 0.35.0) lib/exqlite/connection.ex:641: Exqlite.Connection.execute/4
    (db_connection 2.9.0) lib/db_connection/holder.ex:356: DBConnection.Holder.holder_apply/4
    (db_connection 2.9.0) lib/db_connection.ex:1539: DBConnection.run_execute/5
    (db_connection 2.9.0) lib/db_connection.ex:1587: DBConnection.run_with_retries/5
    (db_connection 2.9.0) lib/db_connection.ex:848: DBConnection.execute/4
    (ecto_sqlite3 0.22.0) lib/ecto/adapters/sqlite3/connection.ex:90: Ecto.Adapters.SQLite3.Connection.query/4
    (ecto_sql 3.13.5) lib/ecto/adapters/sql.ex:1073: Ecto.Adapters.SQL.execute!/5
    (ecto_sql 3.13.5) lib/ecto/adapters/sql.ex:1011: Ecto.Adapters.SQL.execute/6
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:241: Ecto.Repo.Queryable.execute/4
    (ecto 3.13.5) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    (foo 0.1.0) lib/foo/office/machine_aggregator.ex:172: Foo.Office.MachineAggregator.synchronize/2
    (foo 0.1.0) lib/foo/office/machine_aggregator.ex:149: Foo.Office.MachineAggregator.handle_cast/2
    (stdlib 7.3) gen_server.erl:2460: :gen_server.try_handle_cast/3
    (stdlib 7.3) gen_server.erl:2418: :gen_server.handle_msg/3
    (stdlib 7.3) proc_lib.erl:333: :proc_lib.init_p_do_apply/3

# Result of the first DB call
[debug] QUERY OK db=1346.3ms idle=37.3ms
SELECT s0."id", s0."type", s0."arguments", s0."emitted_at", s0."inserted_at", s0."updated_at", s0."device_id", s0."row_number" FROM (SELECT ss0."id" AS "id", ss0."type" AS "type", ss0."arguments" AS "arguments", ss0."emitted_at" AS "emitted_at", ss0."inserted_at" AS "inserted_at", ss0."updated_at" AS "updated_at", ss0."device_id" AS "device_id", row_number() OVER "device_id_window" AS "row_number" FROM (SELECT sse0."id" AS "id", sse0."type" AS "type", sse0."arguments" AS "arguments", sse0."emitted_at" AS "emitted_at", sse0."inserted_at" AS "inserted_at", sse0."updated_at" AS "updated_at", coalesce(json_extract(sse0."arguments", '$.context.device_id'), json_extract(sse0."arguments", '$.arguments.device_id')) AS "device_id" FROM "events" AS sse0 WHERE (sse0."type" IN (?,?,?,?,?)) AND (sse0."id" > ?) ORDER BY sse0."emitted_at") AS ss0 WHERE (NOT (ss0."device_id" IS NULL)) WINDOW "device_id_window" AS (PARTITION BY ss0."device_id" ORDER BY ss0."emitted_at")) AS s0 WHERE (s0."row_number" <= ?) ["v0:machine:pinged", "v0:machine:reported", "v0:machine:added_pending_action", "v0:machine:canceled_pending_action", "v0:machine:sent_package", 0, 50]
↳ Foo.Office.MachineAggregator.synchronize/2, at: lib/foo/office/machine_aggregator.ex:172

# Second DB call (we can see the first really did succeed because "Current version" is now >0)
# Extra strange is the fact that this second call doesn't log any timeout error
[info] MachineManager synchronizing... Current version=999866, target=999992
# Result of the second DB call
[debug] QUERY OK db=12.9ms idle=1608.5ms
SELECT s0."id", s0."type", s0."arguments", s0."emitted_at", s0."inserted_at", s0."updated_at", s0."device_id", s0."row_number" FROM (SELECT ss0."id" AS "id", ss0."type" AS "type", ss0."arguments" AS "arguments", ss0."emitted_at" AS "emitted_at", ss0."inserted_at" AS "inserted_at", ss0."updated_at" AS "updated_at", ss0."device_id" AS "device_id", row_number() OVER "device_id_window" AS "row_number" FROM (SELECT sse0."id" AS "id", sse0."type" AS "type", sse0."arguments" AS "arguments", sse0."emitted_at" AS "emitted_at", sse0."inserted_at" AS "inserted_at", sse0."updated_at" AS "updated_at", coalesce(json_extract(sse0."arguments", '$.context.device_id'), json_extract(sse0."arguments", '$.arguments.device_id')) AS "device_id" FROM "events" AS sse0 WHERE (sse0."type" IN (?,?,?,?,?)) AND (sse0."id" > ?) ORDER BY sse0."emitted_at") AS ss0 WHERE (NOT (ss0."device_id" IS NULL)) WINDOW "device_id_window" AS (PARTITION BY ss0."device_id" ORDER BY ss0."emitted_at")) AS s0 WHERE (s0."row_number" <= ?) ["v0:machine:pinged", "v0:machine:reported", "v0:machine:added_pending_action", "v0:machine:canceled_pending_action", "v0:machine:sent_package", 999866, 50]
↳ Foo.Office.MachineAggregator.synchronize/2, at: lib/foo/office/machine_aggregator.ex:172

I wanted to know what exactly the db/idle times meant. I found the log, and their meaning (db is the :connection_time):

:connection_time - The length of time using the connection (if a connection was used)

:idle_time - The amount of time the connection was idle before use

If I understand it correctly, then: db is the time using the connection; and idle is the time between a process checking out a connection and starting to use it?

I also wanted to know what the timeout error meant. I found the log, but I’m having a hard time understanding where that timer is started to continue investigating – there’s two :erlang.start_timer/4 calls, but neither uses a 4-tuple message..?

TLDR: The questions I’m left with:

Which time(s) count(s) for the timeout error? All of them? Only db? This could explain why the second DB call doesn’t timeout; and actually further experiments point in this direction: with timeout: 10 both calls show the error log.
What influences the idle time? Is it even worrying? (maybe not if only db time counts for the timeout?)
Why is the timeout error logged if no error is raised? Is there any scenario where the error is raised (and thus the calling process dies if it doesn’t catch)?
If these timeout errors are worrying, how can I prevent them from happening (other than setting a higher :timeout)?
Do people using other DBs (e.g. PostgreSQL) have similar experiences? Or is this mostly only applicable/common on SQLite?

Thanks a lot in advance!

9 comments

#ecto #ecto_sqlite3 #exqlite

2 358 9

2026-03-13 11:05:16 UTC

Most Liked

LostKobrakai

The current exqlite implementation cannot stop queries. See Replace DirtyNIF execution model with a different mechanism · Issue #192 · elixir-sqlite/exqlite · GitHub

This is for judging how busy your pool of connections is. Problematic are small idle values, not large ones.

Post #2

funboy

I ran into similar issues recently. A lot depends on your SQLite configuration - could you share yours?

You should keep in mind that only one connection can write to the database, while multiple concurrent reads are possible, but only under certain settings.

For example, if you rely on a **:memory:**db, you may need to set read_uncommitted: 1 to allow reads not to be blocked by writes (though be aware that reads may then see uncommitted changes) otherwise all concurrent read operations will be blocked by every single write.
database: “file:db=memory&cache=shared”,
pool_size: 50,
//read_uncommitted allows reads to proceed without waiting for write locks,
// improving read concurrency at the cost of
// potentially reading uncommitted data
custom_pragmas: [read_uncommitted: 1]

If you’re using a file-based database, you should set the journal_mode to WAL (which is usually the default).
database: “priv/data/sqlite.db”,
journal_mode: :wal,
pool_size: 50,

Another important aspect is caching. By caching the output of a SQLite query for just 1 second, I was able to increase throughput from about 1,000 reads/sec to around 2,000,000 reads/sec

Post #3

LostKobrakai

I‘d argue that for the exqlite driver this is a reasonable choice infeed.

Post #7

Last Post!

LostKobrakai

The DBConnection pool focuses on tranditional databases where additional connections have real cost. So you want to find the sweetspot of just enough, where connections are well utilized but not overloaded. Hence :idle is an interesting metric to track. For sqlite you can read the issue I posted above for all the reasons why this is not necessarily the best approach to go for with sqlite. While it probably was reasonable to seek ecto integration via dbconnection it would likely be better to have a distinct integration layer between ecto and an sqlite driver - that’s just also the more work option nobody seems to have taken on or maintained to a state where it would be an alternative.

Busy timeout is just a retry on the sqlite level and a really basic one. There’s no queue or something like that. It’s just a plain wait (afaik intervals are variable) and retry for up to the busy timeout limit.

Post #10

Where Next?

View thread on forum (has 9 replies!)

ecto

ecto_sqlite3

exqlite

Home Questions & Help>Questions

#ecto #ecto_sqlite3 #exqlite

9 358 9

Last post

REPLY VIEW THREAD WITH 9 POSTS

Questions & Help>Questions

How to do proper post-release steps

Questions & Help>Questions

How to start Oban, but delay job processing?

Questions & Help>Questions

Simple/light way to self host microapps and services?

Questions & Help>Questions

Help with load testing a custom protocol over TCP

Questions & Help>Questions

Why does <.form> discard method and CSRF token when :action is absent, and is action="" a reliable way to submit to the current URL?

Questions & Help>Questions

Help with elixir-ts-mode in doom-emacs config

Questions & Help>Questions

Are Vi keybindings possible inside IEx?

Questions & Help>Questions

I miss the ternary operator - does anyone have a macro that allows a ternary operator in Elixir code?

Questions & Help>Questions

Empty Result on Generic Action with graphql_unnested_unions

Questions & Help>Questions

Clarification about `assign/2,3` usage in `render/1` callbacks

Questions & Help>Questions

Questions Questions ❯

Latest on Elixir Forum

Obscura - PII detection and anonymization for Elixir

News>Announcing

Googly - modern, self-contained clients for any Google API

News>Announcing

You can build an AI agent - Dennis Beatty | ElixirConf US

Learning Resources>Talks

Optex - mathematical optimization (LP/MILP/QP/SOCP) with in-process solvers

News>Announcing

Guarding Against AI Drift: My Automated Elixir Quality Checks

Blogs & Podcasts>Blog Posts

Turxo - bindings to the Turso database

News>Announcing

Creating container error: exception error: undefined function erlang:nif_error/1

Questions & Help>Troubleshooting

Eight months ago I asked for help funding Hologram - here's what that built, and what comes next

Blogs & Podcasts>Blog Posts

Is global limit required for chunking to work with Oban.Pro.Workers.Chunk?

Questions & Help>Troubleshooting

Test-Driven Elixir: Fast Track to Senior Engineer – Britton Broderick | ElixirConf US

Learning Resources>Talks

Trying first LLM example in LiveBook fails

Questions & Help>Troubleshooting

How to do proper post-release steps

Questions & Help>Questions

Exqlite NIF error when trying to debug phoenix project on vscode

Questions & Help>Troubleshooting

From Notebook to Production: Evolution of Elixir Projects - Emmanuel Pinault | ElixirConf US

Learning Resources>Talks

Need Advice: Hex Package release plan & cadence

Chat & Discussions>Dev Env & Tools

Elixir Forum ❯

Sub Categories:

Forums

We're in Beta

About us Mission Statement

Help with SQLite reliability (idle time & timeout errors)

siiky

Help with SQLite reliability (idle time & timeout errors)

Most Liked

LostKobrakai

funboy

LostKobrakai

Last Post!

LostKobrakai

Where Next?

Popular in Questions

How To Get Phoenix & VueJS working Together?

Websocket connection works on localhost, but get 403 error when deployed via docker

IEX in Windows Powershell?

Learning Elixir, frst impressions ( plz don't kill me ! )

Why isn’t mnesia the most preferred database for use in Elixir/Phoenix?

What do you think of Gleam compared to Elixir?

(Postgrex.Error) FATAL 28P01 (invalid_password) password authentication failed for user “postgres”

Other popular topics

What is the best approach for fetching large amount of records from postgresql with ecto

Difference in between :utc_datetime and :naive_datetime in Ecto

How to set environment variables in dev.exs?

Emacs - Elixir Setup Configuration Wiki

What's the best ide/editor for elixir in 2021?

Docker run Error loading shared library libstdc++.so.6 and libgcc_s.so.1

Questions & Help>Questions

Latest on Elixir Forum

Categories:

Sub Categories:

Forums

Popular Tags

We're in Beta

Help with SQLite reliability (idle time & timeout errors)

siiky

Help with SQLite reliability (idle time & timeout errors)

Most Liked

LostKobrakai

funboy

LostKobrakai

Last Post!

LostKobrakai

Where Next?

Popular in Questions

How To Get Phoenix & VueJS working Together?

Websocket connection works on localhost, but get 403 error when deployed via docker

IEX in Windows Powershell?

Learning Elixir, frst impressions ( plz don't kill me ! )

Why isn’t mnesia the most preferred database for use in Elixir/Phoenix?

What do you think of Gleam compared to Elixir?

(Postgrex.Error) FATAL 28P01 (invalid_password) password authentication failed for user “postgres”

Other popular topics

What is the best approach for fetching large amount of records from postgresql with ecto

Difference in between :utc_datetime and :naive_datetime in Ecto

How to set environment variables in dev.exs?

Emacs - Elixir Setup Configuration Wiki

What's the best ide/editor for elixir in 2021?

Docker run Error loading shared library libstdc++.so.6 and libgcc_s.so.1

Questions & Help>Questions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Popular Tags

Our Sponsors

We're in Beta