Hi,
The benchee
documentation states:
Of course, hooks are not included in the measurements (…) Sadly there is the notable exception of too_fast_functions (…). As we need to measure their repeated invocations to get halfway good measurements before_each
and after_each
hooks are included there. However, to the best of our knowledge this should only ever happen on Windows (because of the bad run time measurement accuracy).
However, if I simply add a call to Logger.debug(...)
to my after_each
function the measurements are greatly impacted:
Without the Logger
Name ips average deviation median 99th %
insert_agent 40.26 K 24.84 μs ±36.17% 22.57 μs 49.31 μs
insert_agent 44.80 K 22.32 μs ±26.08% 21.03 μs 41.00 μs
insert_agent 43.39 K 23.05 μs ±31.96% 21.38 μs 42.45 μs
With the Logger
insert_agent 17.01 K 58.80 μs ±353.56% 42.42 μs 292.90 μs
insert_agent 17.32 K 57.75 μs ±249.72% 42.66 μs 321.68 μs
insert_agent 17.72 K 56.43 μs ±246.62% 42.46 μs 284.70 μs
(showing 3 repeated executions of each scenario just to show that it is repeatable)
Any idea what I might be missing?
I’m running benchee 1.0.1 and the setup is:
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
Number of Available Cores: 4
Available memory: 7.68 GB
Elixir 1.12.2
Erlang 24.1.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: default
Estimated total run time: 7 s
Thanks for any clarification.
Given that this only takes a few μs, it looks like this is because this falls into the documented exception of functions that are too fast to accurately measure without running then repeatedly. You should probably also see a warning that this function is very fast in the output, right?
That’s what thought too, but no such warning appears and I’m running on Ubuntu and not Windows. I didn’t mention before but I’m running it inside a ExUnit test. Not sure if it might have an impact, though.
Running in ExUnit shouldn’t make a difference. Do you have the fast function warning disabled? You can do that when setting up your benchmark.
Either way, I can pretty much guarantee that this happens because that function you’re benchmarking is extremely fast. Can you drop a sleep in there for like 50ms and see what happens?
Do you have the fast function warning disabled?
No, the :fast_warning
is enabled by default and I didn’t set it.
I can pretty much guarantee that this happens because that function you’re benchmarking is extremely fast
I’m benchmarking a function that:
- Calls
DateTime.utc_now()
to get a timestamp;
- Creates a record with data from
before_each
(strings and list of strings) and the timestamp;
- Inserts the record it in a local in-memory mnesia table using
:mnesia.write
and :mnesia.transaction
;
- Creates a struct with the data from the record and returns it;
As per the metrics in my first post it is fast indeed. Do you think I should benchmark multiple calls to that function instead of individually?
Can you drop a sleep in there for like 50ms and see what happens?
I had done that even before starting this thread, but with 200ms instead of 50ms and in the after_each:
No logger and no sleep
Name ips average deviation median 99th %
insert_agent 45.78 K 21.84 μs ±27.15% 20.24 μs 42.27 μs
insert_agent 44.65 K 22.40 μs ±33.99% 20.38 μs 48.54 μs
insert_agent 45.27 K 22.09 μs ±31.76% 20.60 μs 40.96 μs
No logger but with Process.sleep(200) in the after_each
insert_agent 4.37 K 228.76 μs ±21.05% 225.18 μs 386.85 μs
I’ll do it in the function being benchmarkd and post the results here.
Ok, that’s really clear now. It makes an impact in your measurements not because the computation happening in the hooks is being measured, but because what you’re doing in those hooks is creating work that is likely being performed asynchronously by the VM during the execution of further runs of the function being benchmarked. This logging fits that, since logging is async. If you want to test that for sure, do something like calculate a ton of fibbonaci numbers in a hook (since that doesn’t do anything outside the active process) to see that it shouldn’t affect the benchmark.
Unfortunately, benchmarking on a VM that is made for parallelism like the BEAM comes with caveats like this sometimes.
I’ll do it (add a sleep) in the function being benchmarked and post the results here.
original function
Name ips average deviation median 99th %
insert_agent 41.26 K 24.24 μs ±45.16% 22.18 μs 47.10 μs
insert_agent 40.71 K 24.57 μs ±33.24% 22.46 μs 48.28 μs
insert_agent 40.64 K 24.61 μs ±31.48% 22.35 μs 49.31 μs
with Process.sleep(50)
Name ips average deviation median 99th %
insert_agent 19.66 50.86 ms ±0.33% 50.83 ms 51.44 ms
insert_agent 19.67 50.83 ms ±0.21% 50.82 ms 51.19 ms
insert_agent 19.67 50.83 ms ±0.17% 50.83 ms 51.18 ms
with Process.sleep(50) + Logger in the after_each
Name ips average deviation median 99th %
insert_agent 19.65 50.88 ms ±0.62% 50.87 ms 52.02 ms
insert_agent 19.64 50.91 ms ±0.80% 50.80 ms 53.10 ms
insert_agent 19.61 50.99 ms ±0.87% 50.89 ms 53.19 ms
The above results show that when the measured function is order of magnitudes slower (from μs to ms) then a simple call to Logger in the hook won’t make a big difference.
If you want to test that for sure, do something like calculate a ton of fibbonaci numbers in a hook (since that doesn’t do anything outside the active process) to see that it shouldn’t affect the benchmark.
Done:
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
Number of Available Cores: 4
Available memory: 7.68 GB
Elixir 1.12.2
Erlang 24.1.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: default
Estimated total run time: 14 s
Benchmarking no_after_each with input default...
Benchmarking with_cpu_heavy_after_each with input default...
##### With input default #####
Name ips average deviation median 99th %
no_after_each 41.38 K 24.17 μs ±52.55% 21.61 μs 51.16 μs
with_cpu_heavy_after_each 9.78 K 102.23 μs ±0.00% 102.23 μs 102.23 μs
Comparison:
no_after_each 41.38 K
with_cpu_heavy_after_each 9.78 K - 4.23x slower +78.07 μs
It does have some impact, but it’s not a direct one, otherwise the slower scenario would have resulted in latencies in the order of seconds and not microseconds. The “cpu_heavy” is my naive implementation of fibonacci calculating the first 45 numbers, which takes around 13 seconds.
So yeah, I guess we are all at the same page. Thanks a lot for your time in this thread Devon and for the Benchee library itself.
Cheers!