How to build performance regression testing into a CI and release cycle

In a recent release of one of my packages I created a material performance regression which was only detected by a consumer of the package. Probably the worst place ever to discover such a regression!

Now I want to build some level of performance regression testing in my test and release cycles. Benchee is a great tool for benchmarking but I can’t see a straightforward way to use it in this way.

Any suggestions or best practises would be much appreciated!

as bare level as it is, :timer.tc should be enough to detect pathological cases.

Edit: I used it to test on of my libraries this way

  test "usleep/1 sleeps for at leat `timeout` µs" do
    check all timeout <- positive_integer(), timeout < 1_000 do
      {elapsed_time, :ok} =
        :timer.tc(fn ->
          MicroTimer.usleep(timeout)
        end)

      assert elapsed_time >= timeout
    end
end
1 Like

If you can identify core hot paths you want to observe performance over time for, Benchee allows you to save and load historical results as part of the profiling run, and historical results are included in its output for comparison’s sake.

I’d have to check how possible it is to make this comparison machine-readable enough for CI to make assertions by it, but the storage format is erlang term-to-binary or similar, so it should be very possible.

1 Like

Thanks to you both. I think for now I’ll go the :timer.tc route for this one pathological case. But I will explore how to better use Benchee, I’m familiar with the historic data storage which has been very useful for specific benchmark experiments - but haven’t work out how to fit it into a test pattern. Some thing more to ponder over the weekend :slight_smile:

One of the main reasons I built AppDoctor was for API regression testing. Although this is more integration testing it may be worthwhile to take a look as a second layer of safety around regression.

Thanks for the pointer! I’ll check it out.

One of the things I realised immediately is that of course timing is sensitive to the machine its running on. So really I need a system that manages performance history on a per-machine basis. Otherwise package users who might run the tests would get bogus results.

More research required!