High Memory usage from Ecto Repo ETS

nirev · April 25, 2023, 6:12pm

Hello, everyone!

I have an application that is seeing increasing memory usage and trying to pinpoint what’s causing it.
The app makes extensive use of Oban, having just a single endpoint to receive jobs with a payload and a callback URL. Other than that it has some auxiliary DB tables for holding access tokens to external services.

Here’s memory usage since last deployment on the 21st:

Seems like everyday around 3am (when we get an influx of periodic nightly jobs) memory goes up and it doesn’t go down.

I’ve added live dashboard to help me figure out this as I don’t have shell access to prod, and it mentions ETS as the highest mem usage:

And looking at ETS, I find this MyApp.Repo ETS table that’s using the most amount of memory:

I remember reading that Ecto caches Query plans in ETS, but before I go hunt down this, I’d like to know if someone has faced this before and what should I look for. At this point I don’t even know what data is ecto caching, if that has any expiration or how what else.

Any help is appreciated!
Thank you!

benwilson512 · April 25, 2023, 7:12pm

That is indeed very high. Given that we may be looking into something ets related, can you post the output of:

mix deps|grep ecto
mix deps|grep postgrex

nirev · April 25, 2023, 7:20pm

Should’ve added that in the initial post, but here it is.

It’s not the latest, but almost there and I didn’t see anything relevant in the CHANGELOG for any of those libs.

$)─> mix deps | grep -E "ecto|postgrex"

* ecto (Hex package) (mix)
  locked at 3.9.4 (ecto) de5f988c
* ecto_sql 3.9.2 (Hex package) (mix)
  locked at 3.9.2 (ecto_sql) 1eb5eeb4
* opentelemetry_ecto 1.1.0 (Hex package) (mix)
  locked at 1.1.0 (opentelemetry_ecto) 9c65ae4b
* phoenix_ecto 4.4.0 (Hex package) (mix)
  locked at 4.4.0 (phoenix_ecto) 09864e55
* postgrex 0.16.5 (Hex package) (mix)
  locked at 0.16.5 (postgrex) edead639

l3nz · April 26, 2023, 5:15am

Can you try dumping the ETS table to see what is there?

benwilson512 · April 27, 2023, 6:34pm

Good suggestion, @nirev you should be able to use Erlang -- ets or match/3 with the limit option to select a sane number of rows.

joey_the_snake · April 27, 2023, 8:38pm

Ecto caches your query string and the key is a combination of the contents of your query.

I don’t know if this is the issue but if you are not using query parameters then I think this could potentially increase the size of the cache dramatically. For example if you are issuing queries like this:

select * from table where column = 1;
select * from table where column = 2;
select * from table where column = 3;
select * from table where column = 4;
...

Instead of using a query parameter for the column value:

select * from table where column = $1

Then I think this could cause your issue. I believe the former are all saved as different cache entries while the latter is saved as one cache entry and the query parameter are substituted into it.

joey_the_snake · April 27, 2023, 8:59pm

There is also an ets table in the drivers. It kind of smells like part of the query result is being put into ets but it is referencing a large binary and preventing garbage collection.

Postgrex gives you the option to copy or reference large binaries. If you are defining something like this it might cause your issue

Postgrex.Types.define(MyApp.PostgrexTypes, [MyExtension], [decode_binary: :reference])