Help speeding up elixir tests in CI pipeline

Hey all, this is my first post in the forum so I apologize upfront if my post doesn’t comply with the policies.

I’m a DevOps engineer and one of the projects I support is in elixir - and I’m no expert in it.
I’ve been working on speeding up the CI pipelines for this project for a while now. I got the total time to production reduced from 40m to ~18m by (1) properly caching dependencies, (2) fixing some duplication issues with our dependencies, (3) changing the machine type from E2 to N2 (GCP) and (4) allocating different amount of resources to different stages of the pipeline.
However, our main pain point now is with tests. We run ~3570 tests split into 4 parallel runners (containers), and it takes around 6-10m to run each (of which 2m are for setting up the database and db migrations). If we run all tests in a single runner, it takes around 20m.

What I have tried so far:

  • I tried allocating more CPU and RAM for the tests containers, but they never use more than 1 CPU and 1.5 of RAM
  • I tried using --max-cases, but this messed up with the order that some tests were run, provoking failures (some tests depend on the output of others)
  • I used --slowest and got a list of slow tests, but they are not many, around 7 tests taking 10 seconds each, other tests are taking ~10-120ms.

Do you guys have any tips on how I can speed up this process?
Perhaps how can I ensure that mix uses all the resources I’m giving it?

Our pipelines are on GitLab and run inside Google’s Kubernetes Engine.

1 Like

Hi @michell, do you know which are the slowest tests and because of what are slow?

I guess db access and network calls are the main suspects, in that case I’ll try to use some doubles and avoid real calls to the infra (but nothing of this is related with Elixir sorry)

This can be reduced if the migrations are squashed into a single DB structure file and just load that instead. It’s healthy for any project to do this every now and then.

Are there any *seeds*.exs files in the project? If so, somebody should take a look for bottlenecks there. It’s amazing how awful code a lot of Elixir programmers allow in there.

This is an extreme anti-pattern and shouldn’t happen. Has any of the programmers explained why they opted for this, let’s be generous and not savage for a minute here, ahem, suboptimal practice?

Good job tracking those down and, you guessed it, another bad pattern.


I’d say enlist at least one programmer from the team and have them improve the code. Cliche advice but there’s nothing else you can do. Allocating more system resources will never help code in any language if it just sits around waiting for stuff.

Alternatively, and if that’s an option, you can show us some code and we can try to help.

2 Likes

This, plus not using much memory or CPU also suggests to me few (if any) of the tests are run with async: true. Note: you can’t just enable that if the tests and/or codebase is full of global data mutation. All this suggests a rather … junior team.

At this point, I’d start making a case based on economics – or time specifically. How many hours of the department’s time is lost per week vs. a more reasonable target of test execution time (plus the delayed feedback in case developers don’t run the tests locally). Then make the case to allocate a certain number of developer hours to improve the tests (and/or core code). This is tech debt that needs to be paid down.

8 Likes