Ecto_sqlite3 - an Ecto3 SQLite3 adapter

kevinlang · March 23, 2021, 1:22am

Hey all,

We have made an Ecto3 Adapter for SQLite3, ecto_sqlite3!

We have successfully on-boarded the full suite of integration tests (320+) that ecto and ecto_sql provide, as well as writing a good amount of our own integration tests and functional tests for the query-generation logic, giving us a great degree of confidence that this adapter is stable and ready for everyday usage. Due to its newness, it hasn’t seen much production use, but we are eager to have people start trying it out and let us know any issues they hit.

This adapter is mainly possible thanks to the exqlite SQLite3 driver, which is an Elixir-written successor to the Erlang SQLite3 NIF esqlite. It leverages Dirty NIF functionality to make the code easier to reason about and maintain.

Lastly, we have begun work on adding a --database sqlite3 option to Phoenix to make this adapter easy to use for new projects. Of course, even without those changes, switching from e.g., Postgres to SQLite3 is usually only a couple of lines of config changes

Feel free to file any issues you encounter in the Github repo.

cc @warmwaffles , who lead most of the development

dimitarvp · March 23, 2021, 12:41pm

Awesome! You guys beat me to it but I’ll be happy to steal your tests for my library.

Thanks for the hard work!

warmwaffles · March 24, 2021, 3:03pm

Heh, @dimitarvp don’t be shy, I actually used your repo for a little while. I was trying to figure out how I was going to implement the adapter.

Something worth noting, the ecto_sqlite3 ecto adapter doesn’t necessarily depend on the exqlite library. It is entirely possible that we could flip implementations in and out. As long as the driver library implements db_connection it should in theory just be a shoe in replacement. This could also mean we make a behaviour library that we just implement for the drivers and ecto_sqlite3 can be completely agnostic to what backend would be preferrable.

dimitarvp · March 24, 2021, 3:05pm

That’s what I was thinking.

Exactly.

Thanks, man. For the kind words as well.

dijoen · March 24, 2021, 9:03pm

Hi,

this is marvelous! I am going to develop an application for an embedded device with limited memory, so sqlite is ideal here.

Many thanks for your work!

FreedomBen · March 25, 2021, 12:54am

This is absolutely wonderful! Thanks so much for doing this. I’ve been wishing for this for quite some time.

the_wildgoose · March 28, 2021, 4:27pm

Wow! This is amazing - Only a few weeks back I really wanted something like this!

Now at risk of coming across as greedy… Can I introduce you to “DuckDB”… It can be compiled against the sqlite C API, so seems like there is some chance of out of the box chance of getting something functioning. That said, I believe it’s SQL dialect is postgresql, so I suspect it would take work to make ecto integration work

My use case (probably) doesn’t require ecto integration though, I just have an urgent need to store some time series data in an embedded appliance where disk space is very minimal. I really just want sqlite with columnar storage for efficiency…

I would be willing to sponsor some work on this if someone wanted to pick this up?

warmwaffles · March 30, 2021, 1:01pm

I’m actually going to take a look at what is necessary to get a separate driver for DuckDB built. It’s a data store that also interests me greatly. Originally what started me down the sqlite path was using an ORC file to store timeseries data, but it was a massive pain to have to keep rewriting the file everytime I wanted to append data.

At minimum, I’ll probably build a really simple interface similar to Exqlite where you can build prepared statements and run those statements against the database.

As for getting an ecto adapter in there, that may prove a little difficult. But I believe we could shim in the ability to specify the driver for the adapter as well via a quick config.

use Ecto.Repo,
  otp_app: :my_timeseries_app,
  adapter: Ecto.Adapters.SQLite3,
  driver: DuckDB

Although, I am unsure if DuckDB provides the ability to use the RETURNING clause that SQLite3 just added.

Anyways, at minimum a driver in elixir is really easy to do.

Sebb · April 1, 2021, 6:23am

Note, that SQLite is also very resilient. If your hardware does not break and you don’t do anything stupid, your data is safe. This is very important for embedded devices, because normally you have to expect a power failure any time.

https://www.sqlite.org/howtocorrupt.html

the_wildgoose · April 1, 2021, 4:06pm

Hi Sebb, Thanks for your thoughts. Yes, sqlite is an incredible database! This is an amazing integration, thanks to both of these projects. I will definitely be looking to use ecto+sqlite on an upcoming project

WRT embedded, there are lots of challenges here. Another is that storage is not infinite and usually quite slow (and ram in short supply). A typical sql database is going to store something like: timestamp (4 bytes), tag (umpteen bytes), value (say 8 bytes)

In contrast RRD will just store basically the value… However, they have trouble with non square datasets, ie varying tags

Dbs like influxdb can store the values as increments from the previous value and handle varying tags. So data usage can be only a few bits per sample sometimes.

So notwithstanding your point about data security, storage requirements for the data are also quite important. This is a difficult problem. I have pondered a simple DETS or cubdb db. However, I am fairly sure that I will reverse into writing a whole timeseries db, hence looking for a real one!

Sebb · April 1, 2021, 4:18pm

The requirements depend on the data you are storing.
It may be OK to lose some samples in a time series, but you can’t lose vital config data.
DETS will not help you in case of a power failure at the wrong time.

Exadra37 · April 1, 2021, 4:32pm

If you tweak the default it uses to write to disk(2 seconds or 64kb), then you can make it safer, but at cost of increasing disk IO.

Sebb · April 1, 2021, 5:39pm

Yes, but that also depends on the memory you are writing to.
Its about risk management. What is the risk of losing data, how likely is a power failure.
If the data must not be lost, even SQLite may not be enough (I think).
Data is eg very safe on a microcontroller that can prevent writes/erases while power failure by brown-out-detection / Vdd monitoring.

Exadra37 · April 1, 2021, 5:45pm

But DETS is disk only. It writes directly to disk, or don’t?

Sebb · April 1, 2021, 5:51pm

Sure, I mean the type of disk. You can’t say, I only write one byte at a time, that’s so fast, surely nothing will happen - because you always have to write a whole page to the flash, that just takes its time. If you write smaller chunks more often at some point this will likely make things worse.

lud · April 1, 2021, 11:38pm

I have an app that monitors multiple Gitlab/Github projects and need to store some computed data about them. I tried to use SQLite but it was cumbersome. Now that there is an Ecto driver I may want to give it another go.

The problem is that, if I remember correctly, Ecto connection is configured project-wide. Our app is designed to have custom data tables for each monitored project. That means a different SQLite DB for each monitored project. On the other hand, performance requirements are very low and a single connection is enough for each DB, no pools needed (the current DB implementation is held by a GenServer anyway).

Would dynamic databases be possible with Ecto?

gregvaughn · April 2, 2021, 12:20am

Ecto connections are managed by a connection pool per Repo. You can have multiple Repos per app. Ecto has recently added the idea of dynamic repos, but I haven’t explored that.

webuhu · April 2, 2021, 7:19am

Haven’t tried, but it seem to be possible:
› Underjord | Ecto & Multi-tenancy - Dynamic Repos - Part 1 - Getting started

dimitarvp · April 2, 2021, 1:25pm

My upcoming library will hide that from you because it will do pooling on the Rust side – the pool size will be configurable either as a project-wide config or when opening the DB.

I’ve done extensive tests (although I am not sure they are very scientific just yet). Opening 50 separate sqlite3 databases each with 20 “connections” (as we know, they are not really connections but file descriptors) and the test program barely consumed a single digit MBs of RAM.

The main challenge for such scenarios would be environments with severely reduced file descriptor limit (part of all Linux installs and a good chunk of the container cloud hosting platforms) so one has to be very judicious in picking a pool size with sqlite3. I suggest you put a pool size equaling the average expected amount of parallel tasks that will read from your DB. But honestly, for most projects a size of 2-3 should be quite fine – sqlite3 is extremely fast so taking a “connection” from the pool even if all “connections” are busy should be a matter of just a few milliseconds.

(Sorry that I haven’t pinged you to help me with the library yet; I’ll be at a lot of doctors in the next 1-2 months so I am keeping quiet since I barely have any free time.)

It’s quite interesting and I’ve used it once – it works just fine (which is something I find myself automatically taking for granted in the Elixir libraries! ).

kevinlang · April 2, 2021, 2:45pm

For this talk of data resiliency, SQLite3 itself provides pretty great guarantees as mentioned. Of course, there is always the possibility of losing the entire host machine in some disaster, and thus the single-node aspect of SQLite3 can work against you. Of course, this applies equally to simple Postgres setups as well.

One library I am eagerly following GitHub - benbjohnson/litestream: Streaming S3 replication for SQLite., which can more or less tail the WAL into S3 or any other object storage, allowing one to recover nearly all data in the case of a total node failure, only losing a couple of seconds worth of data at most. This is much better than hourly or daily backups.

I haven’t played with it yet, but it seems like it is as simple as setting up a systemd service with the S3 storage config and the location of the local database file and it “just works”.