Ecto_sqlite3 - an Ecto3 SQLite3 adapter

Exadra37 · April 2, 2021, 3:32pm

This may work for SQLite, but I think it will not work for other object storages, like Postgres WAL logs. If I am not in mistake in Postgres you also need Point in time recovery to be able to restore from a WAL backup.

This repo supports Postgres and also as now beta support for others, but don’t see their SQLite

WAL-G is an archival restoration tool for Postgres(beta for MySQL, MariaDB, and MongoDB)

From SQLite official docs it seems that that they have a native API to allow for continuous backup:

The Online Backup API was created to address these concerns. The online backup API allows the contents of one database to be copied into another database, overwriting the original contents of the target database. The copy operation may be done incrementally, in which case the source database does not need to be locked for the duration of the copy, only for the brief periods of time when it is actually being read from. This allows other database users to continue uninterrupted while a backup of an online database is made.

lud · April 2, 2021, 5:26pm

Well we have a GenServer on top of the connection so there can be only a single task on each DB at any time, and so the best pool size would be 1 I guess.

Take your time and take care In the meantime I reverted back to CubDB for the data we know the shape of.

dimitarvp · April 2, 2021, 7:07pm

I originally thought I’d do that as well from Elixir side but it’s IMO hugely faster to have the pool stuff controlled from the Rust side (and my semi-scientific measurements back in the last summer have confirmed it).

I can only talk right now but when I get to the first release I’d recommend just blindly using the handle you’re given from the open(...) call and not worry about a thing. I designed the library that way. It automatically serializes access when the pool is only the size of 1 anyway.

kevinlang · April 19, 2021, 8:30pm

I’m happy to report that phx.new support for the sqlite3 adapter is now merged into Phoenix!

Starting with the next Phoenix release, you will now be able to generate a sqlite3 repo for your Phoenix application by specifying the --database sqlite3 flag in your call to phx.new.

mbuhot · May 2, 2021, 12:51pm

Hi @kevinlang I just tried out the sqlite3 support in phx.new, very nice!

I was able to get the tests to run in async: true mode using :memory: databases initialized with a schema dump and put_dynamic_repo.

Repo module

defmodule MyApp.Repo do
  use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.SQLite3

  @doc "Create an in-memory database and make it the current dynamic repo"
  def put_dynamic_memory_repo do
    {:ok, pid} = start_link(Keyword.merge(config(), name: nil, database: ":memory:", pool_size: 1))
    put_dynamic_repo(pid)

    :my_app
    |> :code.priv_dir()
    |> Path.join("repo/structure.sql")
    |> File.read!()
    |> String.split(";\n", trim: true)
    |> Enum.each(&query!/1)
  end

DataCase setup

  setup tags do
    if tags[:async], do: MyApp.Repo.put_dynamic_memory_repo()
    pid = Ecto.Adapters.SQL.Sandbox.start_owner!(MyApp.Repo, shared: not tags[:async])
    on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)
    :ok
  end

Mix aliases

defp aliases do
    [
      setup: ["deps.get", "ecto.setup"],
      "ecto.migrate": ["ecto.migrate", "ecto.dump"],
      "ecto.setup": ["ecto.create", "ecto.migrate", "run priv/repo/seeds.exs"],
      "ecto.reset": ["ecto.drop", "ecto.setup"],
      test: ["ecto.create --quiet", "ecto.migrate --quiet", "test"]
    ]
  end

Are there any plans to support async tests out of the box in ecto_sqlite3 ?

kevinlang · May 2, 2021, 2:51pm

Nice!

We discussed it at one point and decided against it in the ecto_sqlite3 code repository itself (at the time), primarily so we can more easily detect issues around handling of database connections and transaction errors. If we had each test in our code repository be async via this mechanism, we would likely have missed some issues when we were in the midst of ironing out some trickier edge cases around inter-connection issues (e.g., not rolling back a failed write transaction correctly, locking the database for the next test!)

However, for downstream uses, I don’t see any reason why this cannot be supported, eventually. I think at the very least having some documentation on the adapter for this approach could be very useful, if you would like to open a PR adding some

Some other thoughts:

We may be able to remove the need for the structure dump existing if we update the exqlite driver to expose the SQLite3 de/serialization APIs.
We may be able to eliminate the need for this extra code in the MyApp.Repo by creating a new SQLite3 flavored sandbox, Ecto.Adapters.SQLite3.Sandbox that does this (or the de/serialization approach) under the hood, forking the connection into a memory-version. This would allow downstream users to choose their sandbox strategy by specifying either Ecto.Adapters.SQL.Sandbox or Ecto.Adapters.SQLite3.Sandbox for the transaction-isolation or connection-isolation approach, respectively.

mrkurt · May 2, 2021, 3:31pm

This would be really amazing with litestream.io. We’ve been experimenting with using sqlite + litestream to run lightweight, single VM applications that persist their state somewhat safely. It works great for apps that can pause their writes for a minute or two while a new version comes up.

warmwaffles · May 3, 2021, 1:12pm

The biggest issue is that DBConnection defines the interface for pools and is how ecto_sql obtains connections.

dimitarvp · May 3, 2021, 1:24pm

I see. Well, I stumbled upon this curious problem while I was deciding pooling: what happens with anonymous databases? If you just pass :memory: for a pool then it’ll return a different DB for every connection that’s checked out. That’s why I moved the logic to the Rust code because it could handle that by internally generating names, e.g. through UUIDs.

It’ll be a bit painful to replicate that code in Elixir but when I get the free time and energy I’ll definitely do it.

warmwaffles · May 3, 2021, 1:42pm

I was under the impression that even in C or Rust you could not duplicate a connection into the same :memory: database. Are you talking about sharing the same pointer reference?

dimitarvp · May 3, 2021, 1:58pm

No, not the same pointer reference. I gave up on that, it’s too much trouble and internally sqlite will use a mutex which will serialize access anyway – it is almost a completely pointless exercise to use the so-called multi-threaded handle mode.

I was saying that if you setup the connection like this:

config :myapp, MyApp.Repo,
  ...,
  database: ":memory:",
  ...

Then every new checked out connection will point to a brand new anonymous sqlite in-memory DB which is not what one would want – you expect to get a “connection” to the same DB when the connections are checked out from the same pool. Hence I resorted to generating UUIDs internally so even if you use the configuration from above, below the hood you’ll actually have the DB be named file:<UUID_GOES_HERE>?mode=memory&cache=shared".

That way, checking out a connection will produce an independent, serial (not multithreaded) sqlite DB handle that still points at the same anonymous DB. Which allows you to fully utilize sqlite’s ability to read from the same DB from multiple threads at the same time.

tj0 · August 7, 2021, 7:36am

I looked at this months ago and thought to myself finally I’m totally going to use this in production one day since I have a ton of read-only data that never gets updated.

And the day is coming pretty soon. I just ported over typeahead locations from postgres and FTS5 is working pretty well and ecto is doing its thing. Before flipping the switch, I wanted to understand the limitations from a system level.

I started reading sqlite open docs and locking and memory, but everyone in the thread seems pretty knowledgeable, so I thought to ask what is actually happening under the covers?

What I’ve deduced so far is the following:

The sqlite adapter does an fopen to the database. This will return a new fd. This fd will be used by the each of the processes of the Repo pool of which there are 5 by default. I’m just not sure if it’s a single fd for the BEAM or an fd for each process of the pool.
It looks like locking is set to “NORMAL” which means that the database is unlocked after every transaction. This would mean that each of the 5 repo processes can only do queries sequentially on that single filehandle. Of course, if there are 5 different fds, I still don’t quite understand the locking. It looks like a SHARED lock is an internal tracking state for sqlite, but it seems like opening a read-only copy seems to be possible with sqlite_open_v2 and sqlite shared-cache which they have subsequently said not to use .
BEAM should know nothing about the memory management of this since it only knows about the fd and sqlite manages its memory dynamically. But since we’ve opened up the FD inside the BEAM process, the memory should be allocated to the BEAM process? I just couldn’t find any major changes in the resident memory or via observer, but it could just be my database isn’t big enough. Linux should handle all the page allocations and what not, so nothing fancy is needed.
Reading further, it does appear that there is a new FD for each Repo pool process? At least that’s what it appears to be happening from Ecto.Adapters.SQL.Connection — Ecto SQL v3.6.2? So just a bit confused as to how the memory is managed here. Does each sqlite open put it’s memory on the data / stack of the elixir pid?

Considering that I’m only looking to use this for readonly data, it seems to be pragmatic to open up a new fd per elixir pid and open with a shared lock across all of them.

So instead of writing all of that, I should have probably just asked:

Does each elixir pid have it’s own fd? If so, how is memory and locking managed for a read-only workload? Are we dynamically loading the sqlite index into the memory of each pid?
If each elixir pid doesn’t have it’s own fd, how would we stretch the read-only performance?

Help?

kevinlang · August 7, 2021, 4:58pm

Number of handles = number of pool connections, more or less. Locking is handled by the underlying SQLite3 C library, a level “below” exqlite and ecto_sqlite3. The amount of SQLite3 data the SQLite3 library keeps in memory is determined by the cache_size setting, which we set at -2000 or 2MiB. In addition to that there is of course all of the overhead of the exqlite library - the NIF code, the Elixir code, the Elixir data objects and so on.
Read up on WAL, which is enabled by default for ecto_sqlite3. Write-Ahead Logging . TLDR: reads happen concurrently, there should be no scaling concern.

We have some benchmark files in the ecto_sqlite3 repo you can play around with to get an idea for performance. In my experience, SQLite3 out-performs Postgres by a noticeable margin, even on insert-heavy workloads (IFF using WAL journal mode), the benchmarks also reflect that. Of course, the tradeoff is that it has way less features, and is limited to a single node. Not having any of the client/server overhead goes a long way for performance.

tj0 · August 7, 2021, 7:04pm

Cool, this makes sense. At that point, sqlite is probably using a mutex/semaphore underneath, so we shouldn’t need to set anything manually. It is competing with fopen after all.

Not quite understanding this yet, but perhaps this has more to do with sqlite. I’m assuming it isn’t trying to do a mmap of the indexes. When it reads from disk, those pages should be cached by linux, so that memory should show up in buffers/caches and not in beam.

Then there’s the pragma cache limit which appears to be separate. I suppose the nif is doing it’s own malloc as it would need to have some memory when doing large queries. I was playing around with observer last night and there were hardly any memory changes that I found. Does using malloc mean that the memory doesn’t show up in observer/beam or did I just miss it?

If my understanding is correct, the memory usage should be:

per pool connection - up to 2MB sqlite cache + general elixir overhead + any malloc overhead for being able to execute the queries
linux virtual memory - caching/flushing the pages of the database file in memory which is shared across all fd’s

Does that sound right?

Interesting results. I didn’t think the networking overhead would cost that much for the inserts. Before doing the port of my data, I quickly checked the performance of the entire phoenix call between postgres and sqlite and it was between 50-100ms for both of them. This may seem unusually high, but I’m testing on a 10 year old laptop in dev mode. In prod, we’re looking at under 10ms for postgres.

kevinlang · August 7, 2021, 7:31pm

Not entirely sure, tbh. I haven’t had much exposure to observer before. NIFs are definitely quite different than other sort of processes in BEAM, especially dirty NIFs, a relatively new type, that exqlite uses. They run on separate their own schedulers and so on.

Are you using exqlite directly or using ecto_sqlite3 in your experiments? In either case you can modify most of the underlying PRAGMAs we set at the beginning of the connection. For example, you could try increasing the cache_size to a much larger value and see if it makes a difference in your metrics.

It may also be useful to create a simple C program using SQLite3 API to verify your assumptions before digging deeper into the NIF specifics, which as you see makes things a whole bunch more complicated! Unfortunately the amount of collective knowledge and understanding around NIFs, especially dirty NIFs, is pretty lacking. The NIF docs may also give you more insight into what you are looking for.

Good luck!

tj0 · August 8, 2021, 7:08am

Dirty nifs are pretty fantastic. Been using them in production since 2019 I think. Haven’t run into an issue with them yet.

I don’t think that will be necessary anymore as I found some changing metrics. I can’t tell if it’s a memory leak, but it does seem rather unexpected. The resident memory on beam keeps increasing. It’s not detected on observer, so must be the sqlite malloc or cache. It appears to be increasing by a few MB on every query. Using ecto_sqlite3, erlang 24.02 on dev, and erlang 23.2.3 on the test server. exqlite is at 0.6.1 and ecto_sqlite3 is on master.

I pushed this to a user-test server earlier this evening. Since the test server only has 512MB ram, it OOM’d pretty quickly. I’ve tried this on my laptop, but I cannot get it to increase past 850MB of resident memory. So it is totally behaving like it is a memory leak to a point, but then after a certain point, it is completely stable. I didn’t see anything unusual in the nif code, but I didn’t look too closely.

I’ll have to do some more testing in the next few days since I don’t really understand how any of this is possible. The only thing that might be out of the ordinary is that I’m using FTS5, but that’s a standard built-in extension. I suppose it could be the pragma cache…anyway, will need to do some more digging in the next few days.

Has anyone else run into issues with the beam resident memory increasing on every query to sqlite that stabilizes after some time? Any strangeness at all?

kevinlang · August 8, 2021, 3:36pm

Check out this issue.

tj0 · August 9, 2021, 11:12pm

OK, so the memory usage on the 512 MB system was expected behaviour due to the cache setting. However, from my testing, there is a potential memory leak which I haven’t tracked down how to fix quite yet. But I’m going to put it into production anyway which a reduced cache_size as a mitigation.

Performance

Tested against:

Generated random queries on full-text search that I knew would hit based on bi-gram + wildcard “ba*”, etc.


CPU Information: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
Number of Available Cores: 4
Available memory: 7.48 GB
Elixir 1.11.4
Erlang 23.2.1

Name                   ips        average  deviation         median         99th %
exsqlite             86.12       11.61 ms    ±85.74%        9.51 ms       49.56 ms
raw_sqlite           83.64       11.96 ms    ±89.03%        9.28 ms       51.83 ms
ecto_sqlite3         43.93       22.76 ms    ±74.73%       19.87 ms       72.09 ms

Comparison: 
exsqlite             86.12
raw_sqlite           83.64 - 1.03x slower +0.34 ms
ecto_sqlite3         43.93 - 1.96x slower +11.15 ms

The difference between raw_sqlite and exsqlite is due to the using a prepared statement for exqlite instead of a raw statement.

Memory

When tested from IEX, all the sqlite nifs used heap memory space (eheap_alloc). The size of the heap memory used is based on the size of the result of the query and was allocated against iex.

When tested with benchee, for ecto, I couldn’t find the memory being allocated in observer at all. eheap_alloc stayed 13MB the entire time, but resident memory shot up to 500-600MB. Running :erlang.garbage_collect did nothing. The resident memory remained consistent over multiple benchee remained consistent at around 590MB.

But I couldn’t find it. To test further, I killed all Repo for sqlite and the processes “Elixir.DBConnection.Connection:init/1” were all re-spawned. Ran garbage collection again, and nothing was collected. Re-ran benchee and the resident memory doubled to 1GB.

Conclusion

The behaviour I was seeing about consistent memory growth on low-memory systems was not a bug - just 10 workers each consuming 64 MB of cache each. I suggest you put that into the documentation, otherwise people on their raspberry pi’s might have a bad time.

However, memory may leak in other ways (unless I went bug-eyed and lost my mind during the testing process). The exqlite process sets the default cache_size: -64000. In some cases, this memory does not seem to be allocated to any erlang process and then can’t be garbage collected if the process is killed. Shrink_memory, if we could call it, might be able to reclaim it. Or potentially some other sqlite setting where we can ensure that the sqlite cache is on the erlang process heap and not somehow allocated somewhere else. This can be mitigated by reducing the cache_size also.

kevinlang · August 10, 2021, 1:26am

Wow, thanks for looking into this so thoroughly!

Is this a fair summary?

~600 MB memory is expected, because of overly large default cache size
However, it did balloon above that when killing all Repo processes. Seems like on kill we do not cleanly close the DB connection, which would explain it.
Our NIF memory usage does not show in observer. Imo this may be due to the fact that SQLite3 does its own memory management and does not use the NIF alloc functions (see below).

This is done in the ecto_sqlite3 library defaults. We have that opinionated default in the documentation, but it was decided somewhat arbitrarily. See this issue. I created a new issue to consider lowering it, it definitely seems like 2MB is a fine enough default. I created an issue for that, here.

If you could share snippets of your benchmarking code or just the more specific process or calls you do in the IEX or whatever (for those with no experience in these things, like myself), that would be greatly appreciated! I could use that info to try debugging further.

Yes, this could perhaps be done. As mentioned above, I think the fact SQLite3 uses its own malloc is why this is. It may be possible to provide the erlang memory management functions at DB init time, which could incidentally fix the memory leak issue we saw, perhaps.

See this SQLite3 documentation. Not sure if the NIF memory functions provided can fit this interface entirely, though.

warmwaffles · August 10, 2021, 1:29am

Before we change anything around that default cache size, can you do a similar test and lower the value to something like -2000 and check out the results? I originally picked -64000 because that’s what I saw suggested for applications that were going to utilize a heavy read load on an sqlite database. But now that I think about it, yea it’s a bad default for Pis or any sort of embedded device that has very limited memory.

As for a memory leak, someone did report one but I am unable to reproduce it. Right now with how the NIF is structure, the prepared statements are deconstructed when the handle’s ref count goes to zero.

Could you provide your benchmark setup? I’d love to put that in a repository to be able to run.