Benchee & formatters - easy and extensible (micro) benchmarking

Hi @PragTob!

Thank you for your awesome work on benchee!

I have a question regarding std dev. When benchmarking a function that serializes some struct into a binary, I am noticing enormous std dev, like 21k%

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
Number of Available Cores: 6
Available memory: 15.55 GB
Elixir 1.14.2
Erlang 25.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 11 s

Benchmarking raw_attr.encode ...

Name                      ips        average  deviation         median         99th %
raw_attr.encode        5.17 M      193.34 ns ±21393.07%         137 ns         162 ns

Extended statistics: 

Name                    minimum        maximum    sample size                     mode
raw_attr.encode          132 ns    80013984 ns         8.88 M                   136 ns

Memory usage statistics:

Name               Memory usage
raw_attr.encode            32 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name            Reduction count
raw_attr.encode               1

**All measurements for reduction count were the same**

This basically means that I shouldn’t look at the average result as it might not be reliable. What about adding outlier detection to remove such measurements? Is this something that is planned or welcomed as a contribution?

I am also curious what can be the reason of such a big std dev.