How to build performance regression testing into a CI and release cycle

kip · February 22, 2019, 4:04am

In a recent release of one of my packages I created a material performance regression which was only detected by a consumer of the package. Probably the worst place ever to discover such a regression!

Now I want to build some level of performance regression testing in my test and release cycles. Benchee is a great tool for benchmarking but I can’t see a straightforward way to use it in this way.

Any suggestions or best practises would be much appreciated!

massimo · February 22, 2019, 9:11am

as bare level as it is, :timer.tc should be enough to detect pathological cases.

Edit: I used it to test on of my libraries this way

  test "usleep/1 sleeps for at leat `timeout` µs" do
    check all timeout <- positive_integer(), timeout < 1_000 do
      {elapsed_time, :ok} =
        :timer.tc(fn ->
          MicroTimer.usleep(timeout)
        end)

      assert elapsed_time >= timeout
    end
end

shanesveller · February 22, 2019, 9:45am

If you can identify core hot paths you want to observe performance over time for, Benchee allows you to save and load historical results as part of the profiling run, and historical results are included in its output for comparison’s sake.

I’d have to check how possible it is to make this comparison machine-readable enough for CI to make assertions by it, but the storage format is erlang term-to-binary or similar, so it should be very possible.

kip · February 22, 2019, 4:32pm

Thanks to you both. I think for now I’ll go the :timer.tc route for this one pathological case. But I will explore how to better use Benchee, I’m familiar with the historic data storage which has been very useful for specific benchmark experiments - but haven’t work out how to fit it into a test pattern. Some thing more to ponder over the weekend

jordiee · February 22, 2019, 4:41pm

One of the main reasons I built AppDoctor was for API regression testing. Although this is more integration testing it may be worthwhile to take a look as a second layer of safety around regression.

kip · February 22, 2019, 4:56pm

Thanks for the pointer! I’ll check it out.

One of the things I realised immediately is that of course timing is sensitive to the machine its running on. So really I need a system that manages performance history on a per-machine basis. Otherwise package users who might run the tests would get bogus results.

More research required!