While profiling and optimizing some performance critical code, I noticed that Map.take
was one of the biggest contributors to the overall runtime. So out of curiosity, I did a quick benchmark, to see whether there might be a better way to retrieve multiple values from a map at once (somewhere in the range of 1-1000 values at once).
The results of this very simple benchmark really surprised me, and I would really like to have some more opinions on the matter, because I don’t exactly know what to think of it. My expectation was, that Map.take
would be way faster then calling Map.fetch
repeatedly, but the opposite seems to be the case.
But never having done such microbenchmarks before, maybe I just made a mistake and can’t see it?
Disclaimer: I did run the benchmark on my developer machine only. I repeated it 3 times, all with similar results.
TLDR;
Map.take(map, [key_1, ..., key_100])
seems to be ~2.5 times slower than Enum.map( [key_1, ..., key_100], &Map.fetch(map, &1))
Map.take(map, [key_1, ..., key_1000])
seems to be ~3.5 times slower than Enum.map( [key_1, ..., key_1000], &Map.fetch(map, &1))
Here is the code for the benchmark:
map = Map.new(1..10_000, &{&1, &1})
all_keys = Map.keys(map)
Benchee.run(
%{
"fetch (for)" => fn keys -> for key <- keys, do: Map.fetch(map, key) end,
"fetch (Enum.map)" => fn keys -> Enum.map(keys, &Map.fetch(map, &1)) end,
"fetch! (for)" => fn keys -> for key <- keys, do: Map.fetch!(map, key) end,
"fetch! (Enum.map)" => fn keys -> Enum.map(keys, &Map.fetch!(map, &1)) end,
"get (for)" => fn keys -> for key <- keys, do: Map.get(map, key) end,
"get (Enum.map)" => fn keys -> Enum.map(keys, &Map.get(map, &1)) end,
"take" => fn keys -> Map.take(map, keys) end
},
inputs: %{
"1" => Enum.take_random(all_keys, 1),
"10" => Enum.take_random(all_keys, 10),
"100" => Enum.take_random(all_keys, 100),
"1000" => Enum.take_random(all_keys, 1000),
}
)
And here are the results I got on my machine:
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.14.5
Erlang 25.3.2.5
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1, 10, 100, 1000
Estimated total run time: 3.27 min
Benchmarking fetch (Enum.map) with input 1 ...
Benchmarking fetch (Enum.map) with input 10 ...
Benchmarking fetch (Enum.map) with input 100 ...
Benchmarking fetch (Enum.map) with input 1000 ...
Benchmarking fetch (for) with input 1 ...
Benchmarking fetch (for) with input 10 ...
Benchmarking fetch (for) with input 100 ...
Benchmarking fetch (for) with input 1000 ...
Benchmarking fetch! (Enum.map) with input 1 ...
Benchmarking fetch! (Enum.map) with input 10 ...
Benchmarking fetch! (Enum.map) with input 100 ...
Benchmarking fetch! (Enum.map) with input 1000 ...
Benchmarking fetch! (for) with input 1 ...
Benchmarking fetch! (for) with input 10 ...
Benchmarking fetch! (for) with input 100 ...
Benchmarking fetch! (for) with input 1000 ...
Benchmarking get (Enum.map) with input 1 ...
Benchmarking get (Enum.map) with input 10 ...
Benchmarking get (Enum.map) with input 100 ...
Benchmarking get (Enum.map) with input 1000 ...
Benchmarking get (for) with input 1 ...
Benchmarking get (for) with input 10 ...
Benchmarking get (for) with input 100 ...
Benchmarking get (for) with input 1000 ...
Benchmarking take with input 1 ...
Benchmarking take with input 10 ...
Benchmarking take with input 100 ...
Benchmarking take with input 1000 ...
##### With input 1 #####
Name ips average deviation median 99th %
take 11.95 M 83.65 ns ±28285.23% 83 ns 84 ns
fetch (Enum.map) 9.79 M 102.19 ns ±19532.66% 83 ns 125 ns
get (Enum.map) 9.50 M 105.31 ns ±20042.94% 83 ns 125 ns
fetch! (Enum.map) 9.32 M 107.27 ns ±20770.71% 83 ns 125 ns
get (for) 8.70 M 114.93 ns ±20371.78% 83 ns 125 ns
fetch (for) 8.69 M 115.08 ns ±22813.85% 83 ns 125 ns
fetch! (for) 8.46 M 118.25 ns ±17192.86% 83 ns 125 ns
Comparison:
take 11.95 M
fetch (Enum.map) 9.79 M - 1.22x slower +18.53 ns
get (Enum.map) 9.50 M - 1.26x slower +21.66 ns
fetch! (Enum.map) 9.32 M - 1.28x slower +23.62 ns
get (for) 8.70 M - 1.37x slower +31.27 ns
fetch (for) 8.69 M - 1.38x slower +31.43 ns
fetch! (for) 8.46 M - 1.41x slower +34.60 ns
##### With input 10 #####
Name ips average deviation median 99th %
get (Enum.map) 3.94 M 253.52 ns ±11001.59% 208 ns 375 ns
get (for) 3.66 M 273.00 ns ±9423.42% 209 ns 334 ns
take 3.43 M 291.29 ns ±7903.38% 250 ns 292 ns
fetch! (for) 3.43 M 291.61 ns ±7048.02% 250 ns 375 ns
fetch! (Enum.map) 3.41 M 293.46 ns ±8004.15% 250 ns 375 ns
fetch (for) 3.28 M 305.00 ns ±6959.35% 250 ns 416 ns
fetch (Enum.map) 3.02 M 330.82 ns ±6767.28% 250 ns 375 ns
Comparison:
get (Enum.map) 3.94 M
get (for) 3.66 M - 1.08x slower +19.49 ns
take 3.43 M - 1.15x slower +37.77 ns
fetch! (for) 3.43 M - 1.15x slower +38.09 ns
fetch! (Enum.map) 3.41 M - 1.16x slower +39.94 ns
fetch (for) 3.28 M - 1.20x slower +51.48 ns
fetch (Enum.map) 3.02 M - 1.30x slower +77.30 ns
##### With input 100 #####
Name ips average deviation median 99th %
get (Enum.map) 428.81 K 2.33 μs ±605.78% 2.17 μs 2.67 μs
get (for) 426.99 K 2.34 μs ±886.27% 2.08 μs 3.88 μs
fetch! (for) 401.62 K 2.49 μs ±515.68% 2.33 μs 3.21 μs
fetch (for) 351.80 K 2.84 μs ±676.10% 2.54 μs 5.75 μs
fetch! (Enum.map) 324.53 K 3.08 μs ±663.85% 2.83 μs 3.75 μs
fetch (Enum.map) 307.58 K 3.25 μs ±734.78% 2.88 μs 6.21 μs
take 165.53 K 6.04 μs ±471.91% 5.50 μs 10.96 μs
Comparison:
get (Enum.map) 428.81 K
get (for) 426.99 K - 1.00x slower +0.00998 μs
fetch! (for) 401.62 K - 1.07x slower +0.158 μs
fetch (for) 351.80 K - 1.22x slower +0.51 μs
fetch! (Enum.map) 324.53 K - 1.32x slower +0.75 μs
fetch (Enum.map) 307.58 K - 1.39x slower +0.92 μs
take 165.53 K - 2.59x slower +3.71 μs
##### With input 1000 #####
Name ips average deviation median 99th %
get (for) 37.52 K 26.65 μs ±204.39% 24.79 μs 46.61 μs
fetch! (for) 35.14 K 28.46 μs ±44.71% 27.42 μs 42.33 μs
get (Enum.map) 34.12 K 29.31 μs ±15.19% 28.29 μs 44.33 μs
fetch (for) 31.62 K 31.62 μs ±9.41% 30.83 μs 42.04 μs
fetch! (Enum.map) 29.65 K 33.72 μs ±10.58% 32.88 μs 46.62 μs
fetch (Enum.map) 27.34 K 36.58 μs ±18.96% 35.46 μs 48.79 μs
take 10.71 K 93.41 μs ±13.42% 91.17 μs 112.35 μs
Comparison:
get (for) 37.52 K
fetch! (for) 35.14 K - 1.07x slower +1.81 μs
get (Enum.map) 34.12 K - 1.10x slower +2.66 μs
fetch (for) 31.62 K - 1.19x slower +4.97 μs
fetch! (Enum.map) 29.65 K - 1.27x slower +7.07 μs
fetch (Enum.map) 27.34 K - 1.37x slower +9.93 μs
take 10.71 K - 3.50x slower +66.76 μs