Ecto_sqlite3 - an Ecto3 SQLite3 adapter

kevinlang · August 10, 2021, 1:44am

Actually, we do have a destructor registered, which should be called…

I wonder if this may be due to the destructor being called in a delayed manner. I see some messages on mailing lists about something like that. May be worth adding some enif_fprintf in the relevant destructor calls to verify

tj0 · August 10, 2021, 2:56am

Yah, I’ve been running for a few hours on that value and it is stable.

Was that Performance issues / possible memory leak · Issue #153 · elixir-sqlite/exqlite · GitHub ? I read through that one. The memory increase is indeed linear so it smells like a memory leak. But it’s only 10MB over 30 min and it’s not on a strict benchmark, so makes me wonder if it is just general operation.

The “VM lockup” does not smell like a memory leak. I wonder if the OP ran into GCP limits on iops. Usually, the cloud providers give an IOPS budget and when you’re done with it, you have to wait a while.

The current dataset is on the order of GB’s. I’ll have to minnow it down a bit and make sure I can reproduce the issues with the smaller dataset.

tj0 · August 10, 2021, 3:11am

Yes.

Is it possible to dynamically set the default based on the core count / memory?

I’ll put one together in the next day or two. Easy way is to lower worker count to 1 and if your table is larger than 64MB, below should suffice to fill the cache on a worker.

IEx.configure(inspect: [limit: 2])
import Ecto.Query, warn: false
q = from l in App.Something,  where: id > 0
App.Repo(q)

If you do this on IEx, you should see the heap increase in observer. In the process list, you’ll see IEx gain the memory (or maybe it’s some other memory). When I ran benchee, no process gained memory, so maybe the IEx memory gain was just a red herring.

I was going to type more out, but it’s probably faster for me to make a PR at this point with a bench.

That does sound believable to me.

tj0 · August 10, 2021, 3:15am

If it’s called in a delayed manner, I have no idea how it would de-allocate correctly. The information about the address of where the memory was alloc’d is now gone, right?

wanton7 · August 10, 2021, 7:12am

I’m planning to use SQLite for my next Elixir project so I googled a bit. Found this thread that might be of interest.
From 2020 titled Are NIF resources meant to be destroyed immediately on GC?
http://erlang.org/pipermail/erlang-questions/2020-November/100131.html
Quote from there

Jesper Louis Andersen
As a general rule: never use finalizers for the primary resource
reclamation path. You don’t know when they’ll run. You don’t know what
changes are made to a GC in the future, that alters the rule set slightly.
Use them as a safety measure if something goes out of scope. Resources such
as DB connections can be very limited in scope (less than 50 for a typical
non pg_bouncer postgres setup). So you have to manage them quite
explicitly. Memory is far more abundant, and it is also fungible to a large
extent, so you just need a block somewhere.

It seems that starting from OTP 22.0 destructions are scheduled/queued and won’t happen immediately.

kevinlang · August 10, 2021, 3:09pm

Well there are two possible things we could do:

“Manually” clean up prepared statements ourselves after executing and reading the results from a query. We currently do not do this, and it seems like it should be easy enough. Filed this issue for that.
Improve how we clean up the database connection resource. This will be much harder. I don’t have a clear idea on how this would be done.

wanton7 · August 10, 2021, 3:26pm

Are SQLite connections being closed? I’ve used SQLite in the past and my understanding is that you can just keep using same connections over and over.

dimitarvp · August 10, 2021, 3:33pm

One of the benefits of my library, if it ever gets to a working state, is that the Rust crate it steps on maintains a small pool of prepared statements (size configurable) and aggressively prunes them in a LRU fashion.

I believe you guys can do the same with the poolboy library. It’s super easy to use.

kevinlang · August 10, 2021, 4:06pm

Yep we reuse the same pool of connections. I mean in the case the process crashes or is killed - from reading above it seems that because in that case we rely on NIF destructor behavior, memory may stay around for a while waiting to be pruned. If we could somehow detect the crash and clean up more aggressively, that would be an improvement - but I’m not sure how to do that. In terms of a connection being cleaned up in a non-crashing manner, that should work as expected and clean the memory more immediately as we are explicitly calling the sqlite3 close function.

wanton7 · August 10, 2021, 4:09pm

I think for those cases like process crash or kill just relying on GC should be fine because it’s not a primary resource reclamation path.

tj0 · August 10, 2021, 5:05pm

The issue appears to be that at certain times the memory seems to be allocated outside of the processes. This means the GC won’t work. To get around that, a destructor was added and called, but since destructors are queued…I’m not really sure they are going to access the memory space they were supposed to free. Not an expert though.

wanton7 · August 10, 2021, 5:28pm

I think they are reference counted. From here Are NIF resources meant to be destroyed immediately on GC?

Sverker Eriksson
I fully agree with Jesper about not relying on GC for resource reclamation. An additional risk is that the resource does not get GC’ed at all if its reference has “leaked” away onto the heap of another process, like for example a logger or an io-server process.

So something could be keeping those references around.

warmwaffles · August 27, 2021, 1:49am

@wanton7 and @tj0 I recent went through and fixed the memory leak issue, or rather the perceived leak issue.

I could not reproduce the leak, but, after reading through mailing lists and what not, the issue seemed to stem from the garbage collector not destructing the prepared statements in a timely manner. So instead, when ecto closes the connection we handle it by also releasing the underlying resource (the prepared statement) and then the destructor will run later and free the last remaining bytes that was a pointer to the prepared statement.

Overall this should suffice and would love to see if anyone who was having issues with memory growing like crazy during high load to give the latest release a try.