I tweeted this morning about some BEAM internals (https://twitter.com/akoutmos/status/1266034402422853633). Specifically about how maps are represented internally as hash array mapped tries (or HAMT for short) when the size of the map is greater than 32 entries.
Name ips average deviation median 99th %
Small Map - first item 41.25 M 24.24 ns ±2890.98% 22 ns 66 ns
Large Map - invalid item 31.10 M 32.15 ns ±841.48% 30 ns 83 ns
Large Map - first item 23.43 M 42.67 ns ±217.74% 40 ns 113 ns
Large Map - middle item 23.06 M 43.36 ns ±218.75% 40 ns 118 ns
Large Map - last item 19.82 M 50.45 ns ±355.66% 48 ns 121 ns
Small Map - middle item 12.21 M 81.93 ns ±376.39% 78 ns 198 ns
Small Map - invalid item 5.96 M 167.82 ns ±203.11% 162 ns 342 ns
Small Map - last item 4.50 M 222.17 ns ±355.61% 214 ns 459 ns
Comparison:
Small Map - first item 41.25 M
Large Map - invalid item 31.10 M - 1.33x slower +7.91 ns
Large Map - first item 23.43 M - 1.76x slower +18.43 ns
Large Map - middle item 23.06 M - 1.79x slower +19.12 ns
Large Map - last item 19.82 M - 2.08x slower +26.21 ns
Small Map - middle item 12.21 M - 3.38x slower +57.69 ns
Small Map - invalid item 5.96 M - 6.92x slower +143.58 ns
Small Map - last item 4.50 M - 9.17x slower +197.93 ns
Memory usage statistics:
Name Memory usage
Small Map - first item 0 B
Large Map - invalid item 0 B - 1.00x memory usage +0 B
Large Map - first item 0 B - 1.00x memory usage +0 B
Large Map - middle item 0 B - 1.00x memory usage +0 B
Large Map - last item 0 B - 1.00x memory usage +0 B
Small Map - middle item 0 B - 1.00x memory usage +0 B
Small Map - invalid item 0 B - 1.00x memory usage +0 B
Small Map - last item 0 B - 1.00x memory usage +0 B
Just something I figured I would share for those who are interested
Hey @akoutmos, thanks for taking the time to do this. So it is actually true - it is faster to work with bigger maps compared to small ones… who would expect that
You should also try with a “typical” map (which varies based on application), but something like 5–10 keys. Most of my maps would fall somewhere in that range.
Thanks for the input, that’s a great test to run! I’ll create an additional set of synthetic benchmarks using atoms and integers (for completeness although I would suspect atoms and integers to perform roughly the same since atoms are represented internally as integers ?).
Like most synthetic benchmarks this was more of a mental exercise to show the cut over between the two representations of maps https://github.com/erlang/otp/blob/master/erts/emulator/beam/erl_map.h#L72-L76. For real world accuracy, smaller maps (5-10 keys) would have been more realistic…but without recompiling Erlang locally with those C macros changed I don’t see a way of doing it. There aren’t any run-time flags to control that right? Perhaps that will be my next mental exercise test haha.
What I would be interested in is the writing speed, everything I know about Elixir by now says writing/reading from maps is really fast, but does map size have an impact on writing, and stuff like update_in\3?
I was able to sneak in a little bit of benchmarking time during my lunch break and have updated the gist https://gist.github.com/akoutmos/6e965558fa8bb5771e90b1961762a7d0 to have maps with integers and atoms as keys. Super interesting to see how the results kinda flipped when using atoms/integers as keys Below are the results:
Name ips average deviation median 99th %
Binary Really Small Map - first item 39.12 M 25.56 ns ±424.34% 25 ns 42 ns
Binary Small Map - first item 38.46 M 26.00 ns ±215.45% 25 ns 60 ns
Binary Large Map - invalid item 29.38 M 34.04 ns ±51.90% 33 ns 40 ns
Binary Large Map - first item 23.10 M 43.28 ns ±46.48% 43 ns 53 ns
Binary Large Map - middle item 22.58 M 44.29 ns ±272.20% 44 ns 52 ns
Binary Really Small Map - middle item 22.25 M 44.95 ns ±60.56% 44 ns 63 ns
Binary Really Small Map - invalid item 19.79 M 50.52 ns ±2762.06% 49 ns 94 ns
Binary Large Map - last item 19.63 M 50.94 ns ±741.04% 49 ns 104 ns
Binary Really Small Map - last item 12.94 M 77.29 ns ±27.06% 76 ns 95 ns
Binary Small Map - middle item 12.89 M 77.56 ns ±28.47% 76 ns 95 ns
Binary Small Map - invalid item 6.16 M 162.40 ns ±16.29% 161 ns 177 ns
Binary Small Map - last item 4.70 M 212.92 ns ±14.58% 211 ns 229 ns
Name ips average deviation median 99th %
Atom Really Small Map - first item 106.48 M 9.39 ns ±709.95% 8 ns 33 ns
Atom Small Map - first item 100.67 M 9.93 ns ±417.93% 9 ns 22 ns
Atom Really Small Map - middle item 85.82 M 11.65 ns ±2709.93% 10 ns 46 ns
Atom Really Small Map - invalid item 85.01 M 11.76 ns ±1255.81% 11 ns 21 ns
Atom Really Small Map - last item 80.32 M 12.45 ns ±171.09% 11 ns 30 ns
Atom Small Map - middle item 77.24 M 12.95 ns ±341.33% 12 ns 43 ns
Atom Small Map - last item 62.69 M 15.95 ns ±167.93% 15 ns 51 ns
Atom Large Map - invalid item 53.46 M 18.71 ns ±103.92% 18 ns 46 ns
Atom Small Map - invalid item 53.07 M 18.84 ns ±482.30% 18 ns 29 ns
Atom Large Map - last item 42.74 M 23.39 ns ±75.83% 22 ns 41 ns
Atom Large Map - middle item 39.52 M 25.30 ns ±74.97% 24 ns 41 ns
Atom Large Map - first item 39.19 M 25.52 ns ±70.75% 24 ns 52 ns
Name ips average deviation median 99th %
Integer Really Small Map - first item 80.64 M 12.40 ns ±132.30% 12 ns 23 ns
Integer Small Map - first item 78.95 M 12.67 ns ±131.36% 12 ns 24 ns
Integer Really Small Map - middle item 72.38 M 13.82 ns ±299.39% 13 ns 25 ns
Integer Really Small Map - last item 65.44 M 15.28 ns ±94.27% 14 ns 26 ns
Integer Really Small Map - invalid item 64.29 M 15.55 ns ±104.99% 15 ns 24 ns
Integer Small Map - middle item 57.57 M 17.37 ns ±4032.76% 16 ns 26 ns
Integer Small Map - last item 45.08 M 22.18 ns ±497.67% 21 ns 32 ns
Integer Small Map - invalid item 42.96 M 23.28 ns ±954.08% 22 ns 32 ns
Integer Large Map - invalid item 42.77 M 23.38 ns ±114.59% 23 ns 32 ns
Integer Large Map - first item 35.13 M 28.47 ns ±121.23% 28 ns 37 ns
Integer Large Map - middle item 35.03 M 28.54 ns ±61.64% 28 ns 37 ns
Integer Large Map - last item 34.12 M 29.31 ns ±1685.00% 28 ns 63 ns