Benchee & formatters - easy and extensible (micro) benchmarking

PragTob · June 7, 2016, 7:14pm

Hi all,

I just build and released my first hex.pm package. It’s a (micro) benchmarking tool somewhat inspired by ruby’s benchmark-ips. It’s goal is to be easy to use, nice output, extensible through plugins and to provide you with statistics, to better see how reliable your results are. So far the statistics provided are average, iterations per second, standard
deviation and median.

It also comes with the first plugin, BencheeCSV to format output as CSV for easy usage with spreadsheet tools so you can make pretty graphs etc.

Announcement blog post:

benchee

github https://github.com/PragTob/benchee
hex https://hex.pm/packages/benchee

benchee_csv

github https://github.com/PragTob/benchee_csv
hex https://hex.pm/packages/benchee_csv

Thanks, enjoy benchmarking and feedback welcome

Tobi

PragTob · December 1, 2016, 4:30pm

Hi everyone,

I didn’t want to “spam” the form with release announcements after my initial post about benchee back in June but I figured with all the changes and new features now might be a good time to write something again! If it’s too much, please tell me

I just released new versions of my benchmarking library benchee along with benchee_csv, also introducing new formatters benchee_json and finally benchee_html to create nice HTML reports, with 4 different graphs that can also be exported as PNG images! And of course, there also is a blog post about all of it.

benchee has come a long way and I’m particularly excited about it supporting running your suite with different inputs as different implementations may behave differently depending on input size or structure. Also I changed the API of the main interface after a short but good discussion in this very forum. And I gotta say it looks way more elixir now

alias Benchee.Formatters.{Console, HTML}
map_fun = fn(i) -> i + 1 end
inputs = %{
  "Small (10 Thousand)"    => Enum.to_list(1..10_000),
  "Middle (100 Thousand)" => Enum.to_list(1..100_000),
  "Big (1 Million)"       => Enum.to_list(1..1_000_000),
}

Benchee.run %{
  "tail-recursive" =>
    fn(list) -> MyMap.map_tco(list, map_fun) end,
  "stdlib map" =>
    fn(list) -> Enum.map(list, map_fun) end,
  "body-recursive" =>
    fn(list) -> MyMap.map_body(list, map_fun) end,
  "tail-rec arg-order" =>
    fn(list) -> MyMap.map_tco_arg_order(list, map_fun) end
}, time: 10, warmup: 10, inputs: inputs,
   formatters: [&Console.output/1, &HTML.output/1],
   html: [file: "bench/output/tco_detailed.html"]

This then produces outputs thanks to the HTML formatter as you can see in this example report or get a preview with this image:

So yeah, I hope you like it. Would be great to hear what you like, or even better what you are missing, not liking or bugs so that I can improve and extend benchee and its associated libraries

Thanks!
Tobi

OvermindDL1 · December 1, 2016, 4:31pm

Ooo, awesome! I love benchee and it just keeps getting better! ^.^

aleandros · December 1, 2016, 4:54pm

It looks great!

I’m very interested in learning about benchmarking, since AFAIK is a subject with a lot of depth into it. Is there something in which you need particular help with contributions?

PragTob · December 2, 2016, 8:56am

Thanks, that means a lot

Oh there is a ton of depth to it and I hope the folks and ElixirLive will agree and find it similarly fascinating as I do

I haven’t yet read too many papers in depth about it. There’s tons of room for contributions, I usually keep a good backlog of issues/features that have come to my mind so that I don’t forgot them and that possible contributors have a place to get started.

Off the top of my head a couple of particularly interesting/important ones to my mind:

reduce the effect of garbage collection/try to avoid it optionally
measuring memory consumption
A new formatter that produces graphs directly, luis ferreira has written a library for that I’m on a train right now so can’t go looking for it
documentation/fixes for using benchee from Erlang would be great - I still need to dust up my Erlang but that’s probably something the Erlangers here could maybe help with
on the statistics side benchmarking until a confidence value is reached would be great
benchee_html has a bunch of smaller design fixes and bigger changes to deal with performance problems - open issues

Of course there is much more, new statistics, providing more system information… there’s lots to do

PragTob · February 27, 2017, 5:08pm

In case you’re interested and from the greater Hamburg (Germany) area, I’ll be at hh.ex tomorrow talking about benchmarking and benchee with the goal to also hack on benchee together to implement some tiny features (some are especially tagged on github for this).

So if you like here is the meetup page

PragTob · April 24, 2017, 5:29pm

New tiny releases of benchee in 0.7, benchee_html 0.2 and benchee_json 0.2 have made their way to hex yesterday evening

The biggest feature is that the benchee_html report is now properly split up. Other fixes include goodies like relaxing the Poison dependency as well as adjusting some outputs and parallel statistics generation.

More cool stuff to come of course also benchmarking talks will be given:

In case anyone wants t hang out and hack

michalmuskala · April 24, 2017, 5:54pm

I used Benchee couple times, and it’s definitely a solid solution - probably my “go to” one for all my benchmarking needs.

That said, one thing that bothers me each time I look at the README is that the example benchmark is not inside a module. Code that is not inside modules, is not compiled, but interpreted. This gives vastly different performance characteristics and makes benchmarks pretty much useless.
This is not a huge problem in the example, since the functions immediately call a module (so only the initial, anonymous function call is interpreted), but can lead to false results with more complex things.

PragTob · April 24, 2017, 6:38pm

Hey! Thanks for the input.

Yes, definitely - you should call to functions that are defined in modules somewhere and compiled and that’s how I do it, maybe it should be made clearer. E.g. I usually just have my benchmarks call functions I have defined elsewhere or a couple of ecto functions

Still, I’d like for the whole suite to be properly compiled as being close to production systems is super important and I can see how people would create erroneous benchmarks through this.

Still - I’d like to know about Elixir works internally and the BEAM VM, but information seems to be really sparse (I read the ELI5 for BEAM and soem usual gotchas but it’s… not much). Pointers are very welcome.

My understanding is that .exs files are interpreted while the .ex are compiled - correct?

The only way, that jumps to mind given my knowledge is correct, is that I’d have people define their code in a module in a .ex and then have a script or some executable call it. Something like:

defmodule MyBenchmark do
  def benchmark do
    Benchee.run(...)
  end
end

And then in a script file just do:

MyBenchmark.benchmark()

Is that the only way? Is there a better way? Your input and or pointers would be highly appreciated @michalmuskala (+ of course everyone else!)

michalmuskala · April 25, 2017, 3:50am

Modules are compiled, no matter if in .exs or .ex file - the only difference is if there’s a compilation artefact (in the form of a .beam file produced). Code outside modules is not compiled.

When it comes to some reference on BEAM internals, I guess the newly released “BEAM Book” is the best comprehensive source out there https://happi.github.io/theBeamBook/ - it’s not complete, but it’s an awesome resource nonetheless.

PragTob · April 25, 2017, 9:19pm

Thanks a ton! Will try to get some of that information into the README of benchee and or the wiki

Also thanks for the link to the book - I had missed that until now. Looking forward to carving out the time to read it!

PragTob · June 8, 2017, 10:29am

Benchee 0.9.0 made it to release (Changelog). The main features are gathering more system data and some compatibility for calling benchee from erlang (sample project).

As a side note, I’m at Erlang User Conference atm and will also talk about benchmarking tomorrow. Feel free to come by and chat, love to do that (usually green t-shirt and/or Liefery pullover).

PragTob · October 26, 2017, 9:52am

Hello again beloved elixirforum! After quite the break, 0.10 is here with hooks and other great improvements especially in the HTML formatter!

PragTob · April 1, 2018, 11:14am

After long discussions of what the API of benchee ought to look like we decided for a drastical change - we need to change the name to bunny!

You can check out the bunny repository or get it straight from hex

doomspork · April 1, 2018, 2:43pm

~~@PragTob will you be updating the Elixir School lesson on Benchee to reflect these changes? That would be wonderful~~

Realizing the date now.

Perhaps it would have been better to make a new post rather than to edit an existing post about a real package as a prank.

PragTob · April 1, 2018, 3:11pm

Oh, sorry definitely will! Learners shouldn’t miss out on bunnytastic material

Maybe I can coax @devonestes into doing it though?

Devon, pleeasseeeee!

axelson · April 1, 2018, 4:50pm

@PragTob is there an overview of the differences between the two api’s? They seem nearly identical to me.

From Benchee:

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Benchee.run(%{
  "flat_map"    => fn -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten end
}, time: 10)

From Bunny:

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Bunny.eat(%{
  "flat_map"    => fn -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten end
})

Or is the difference in functions other than the main benchmarking function?

tmbb · April 1, 2018, 5:54pm

This is a very obvious April Fools’ prank…

EDIT: it got me more or less until the code example that shows Bunny.eat(...).

axelson · April 1, 2018, 7:33pm

Haha, maybe I shouldn’t be reading the forum so early

PragTob · April 2, 2018, 10:24am

Sorry if I really fooled anyone
Wanted to make it really clear through hyperboles, version numbers (1.4.2018 ) and great new features like “bunny assistant” that it’s a joke so that nobody ends up changing all their code to use bunny.

If someone wants to continue using bunny - it’s totally usable. It’s only a thin wrapper around benchee with a couple of defdelegate's and I don’t expect to break it.

To be doubly sure, I’m adding “this was a prank” notices everywhere applicable

@doomspork Thought about making a new post, but as I once made that for a new release or feature and they got merged together I thought it’s not the way things are supposed to be here

edit: can someone from the admins please change the title back? Thanks!