How to speed up the start of `mix test`?

josevalim · March 30, 2024, 5:38pm

Good catch @al2o3cr. This may also happen when the tests have hardcoded IDs or hardcoded unique columns. For example, if your user has unique emails, you want the email to be email-#{System.unique_integer()}@example.com, so each user has a distinct email.

Also make sure that your test_helper.exs are not racing with each other (for example, trying to insert the same data), since you are running multiple partitions.

benwilson512 · March 30, 2024, 7:39pm

Yes this is a very good suggestion. DB metrics can help uncover this by logging locks and by logging long queries. Simple inserts will show 100ms or even multi second times.

shamshirz · April 12, 2024, 7:32pm

Updates!

Thanks so much for the ideas y’all! Interesting turn of events, I went digging into the DB side of this because we have addressed issues like transaction locks in the past and implemented unique values for constrained columns, but I think I misled us!

Red Herring - DB timeout

My current hypothesis is that the DBConnection.ConnectionError) owner #PID<0.726.0> timed out because it owned the connection for longer than 120000ms timeout is a redherring! It is timing out and worth looking into, but the DB isn’t the cause of the timeout, it’s just reporting that it’s been blocked for quite a while with an open connection(? I think).

I re-ran tests locally to discover that the cover portion of the test command is using most of the unaccounted-for time!

### Just Test (21s)
> time MIX_DEBUG=1 mix test test/crowbar/scan/vciso_card_test.exs > tmp.txt && elixir ./analyze.exs

### Partitioned (21s)
> time MIX_TEST_PARTITION=1 MIX_DEBUG=1 mix test test/crowbar/scan/vciso_card_test.exs --partitions 2 > tmp.txt && elixir ./analyze.exs

### Cover (61s)
> time MIX_DEBUG=1 mix test test/crowbar/scan/vciso_card_test.exs --cover > tmp.txt && elixir ./analyze.exs

### Partition & Cover (73s)
> time MIX_TEST_PARTITION=1 MIX_DEBUG=1 mix test test/crowbar/scan/vciso_card_test.exs --partitions 2 --cover > tmp.txt && elixir ./analyze.exs

I put the analyze.exs into a Gist here if you want to recreate. Should be code base agnostic.

New Hypothesis: ExCoveralls is using the time!

I annotated ExCoveralls locally to reproduce and I found the time here

-> Running mix test test/crowbar/scan/vciso_card_test.exs --cover (inside Crowbar.Mixfile)
-> Running ExCoveralls Cover.compile
<- Ran ExCoveralls Cover.compile in 25757ms
…
Finished in 4.9 seconds (4.9s async, 0.00s sync)
…
-> Running ExCoveralls Cover.execute
<- Ran ExCoveralls Cover.execute in 13249ms
<- Ran mix test in 43973ms

This example just ran 1 test file, but the time breakdown is clear
44s total = 26s Cover.compile + 13s Cover.execute + 5s tests

note: These time are all much faster because it’s local on a multicore laptop vs. CI with the default github action runner

Mystery solved and if y’all happen to have any thoughts or experience on how to address things like this in CI, I’d love to hear your thoughts!

My likely next step is to make a --cover Action for the end of CI, so we can run the tests and get feedback as fast as possible without coverage.

josevalim · June 3, 2024, 1:06pm

Yes, --cover has to annotate code and it is slower because it has to run the annotated version of the code. However, from Erlang/OTP 27 (I believe), cover was integrated into JIT, and it should be much faster. You will be able to use Erlang/OTP 27 alongside Elixir v1.17.