Is there a way to order tests by RAM consumption?

dimamik · March 26, 2025, 10:17am

We have a few thousand tests in our suite, and we started getting OOM errors from our CI. Is there a way to order tests by RAM consumption? Has anyone done anything similar before?

D4no0 · March 26, 2025, 10:32am

I don’t think there is a way to determine that without making a special tool for collecting that data when tests run and even then you will have no guarantees after changing the code.

One of the widely used options is partitioning, but this is a fix when your tests take too long to execute. I would guess there should be an option of how much concurrency you can set for async tests, but I am not familiar where you would look for such an option as mix test doesn’t seem to provide it.

ruslandoga · March 26, 2025, 12:49pm

I would try using tprof — tools v4.1.1 (in memory mode) in setup blocks or find a way to run the test scripts under mix profile.tprof — Mix v1.18.1, note you would need to make sure that all processes are accounted for, not just the test process.

Like

$  MIX_ENV=test mix profile.tprof -e "Mix.Tasks.Test.run([])" --type memory

but for each test file individually.

dimamik · March 26, 2025, 1:26pm

I thought that there might be an already built solution for that, but seems that no.

Regarding the partitioning of the tests, this might be a good temporary fix (besides bumping the container size), but I’d want to find the exact test bloating the memory.

Regarding the tprof, this looks promising, I’ll give it a try, really appreciate your help guys!

D4no0 · March 26, 2025, 1:29pm

The price of time for your investigation outweighs most probably by much just you adding a few more gigabytes of RAM.

This is usually why nobody bothers with this, even in production.

ypconstante · March 26, 2025, 2:07pm

You can try reducing the max number of parallel tests, Elixir by default uses System.schedulers_online() * 2 - source.
In the LiveView project I work on we had frequent timeouts with this default, so we added an alias to change it to System.schedulers_online() * 1.5, it solved the timeout issues and the total test time didn’t change. It should also reduce the memory usage.

test: "test --warnings-as-errors --max-cases #{round(System.schedulers_online() * 1.5)}",

dimamik · March 26, 2025, 3:43pm

Yes, this is definitely a valid concern. But sooner or later we’d need to find what makes our memory bloat, and sometimes it pays off to start early.

dimamik · March 26, 2025, 3:44pm

Yes, we’re already limiting these, but ideally I’d want to find the offender(s). But thanks for an answer!

garrison · March 26, 2025, 5:59pm

I suppose you could run one test at a time and measure the total BEAM memory usage, which might be easier if this is a one-time thing.