Is Map.take slower than running Map.fetch repeatedly?

Yegair · November 1, 2023, 9:20pm

While profiling and optimizing some performance critical code, I noticed that Map.take was one of the biggest contributors to the overall runtime. So out of curiosity, I did a quick benchmark, to see whether there might be a better way to retrieve multiple values from a map at once (somewhere in the range of 1-1000 values at once).

The results of this very simple benchmark really surprised me, and I would really like to have some more opinions on the matter, because I don’t exactly know what to think of it. My expectation was, that Map.take would be way faster then calling Map.fetch repeatedly, but the opposite seems to be the case.

But never having done such microbenchmarks before, maybe I just made a mistake and can’t see it?

Disclaimer: I did run the benchmark on my developer machine only. I repeated it 3 times, all with similar results.

TLDR;

Map.take(map, [key_1, ..., key_100]) seems to be ~2.5 times slower than Enum.map( [key_1, ..., key_100], &Map.fetch(map, &1))

Map.take(map, [key_1, ..., key_1000]) seems to be ~3.5 times slower than Enum.map( [key_1, ..., key_1000], &Map.fetch(map, &1))

Here is the code for the benchmark:

map = Map.new(1..10_000, &{&1, &1})
all_keys = Map.keys(map)

Benchee.run(
  %{
    "fetch (for)" => fn keys -> for key <- keys, do: Map.fetch(map, key) end,
    "fetch (Enum.map)" => fn keys -> Enum.map(keys, &Map.fetch(map, &1)) end,
    "fetch! (for)" => fn keys -> for key <- keys, do: Map.fetch!(map, key) end,
    "fetch! (Enum.map)" => fn keys -> Enum.map(keys, &Map.fetch!(map, &1)) end,
    "get (for)" => fn keys -> for key <- keys, do: Map.get(map, key) end,
    "get (Enum.map)" => fn keys -> Enum.map(keys, &Map.get(map, &1)) end,
    "take" => fn keys -> Map.take(map, keys) end
  },
  inputs: %{
    "1" => Enum.take_random(all_keys, 1),
    "10" => Enum.take_random(all_keys, 10),
    "100" => Enum.take_random(all_keys, 100),
    "1000" => Enum.take_random(all_keys, 1000),
  }
)

And here are the results I got on my machine:

Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.14.5
Erlang 25.3.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1, 10, 100, 1000
Estimated total run time: 3.27 min

Benchmarking fetch (Enum.map) with input 1 ...
Benchmarking fetch (Enum.map) with input 10 ...
Benchmarking fetch (Enum.map) with input 100 ...
Benchmarking fetch (Enum.map) with input 1000 ...
Benchmarking fetch (for) with input 1 ...
Benchmarking fetch (for) with input 10 ...
Benchmarking fetch (for) with input 100 ...
Benchmarking fetch (for) with input 1000 ...
Benchmarking fetch! (Enum.map) with input 1 ...
Benchmarking fetch! (Enum.map) with input 10 ...
Benchmarking fetch! (Enum.map) with input 100 ...
Benchmarking fetch! (Enum.map) with input 1000 ...
Benchmarking fetch! (for) with input 1 ...
Benchmarking fetch! (for) with input 10 ...
Benchmarking fetch! (for) with input 100 ...
Benchmarking fetch! (for) with input 1000 ...
Benchmarking get (Enum.map) with input 1 ...
Benchmarking get (Enum.map) with input 10 ...
Benchmarking get (Enum.map) with input 100 ...
Benchmarking get (Enum.map) with input 1000 ...
Benchmarking get (for) with input 1 ...
Benchmarking get (for) with input 10 ...
Benchmarking get (for) with input 100 ...
Benchmarking get (for) with input 1000 ...
Benchmarking take with input 1 ...
Benchmarking take with input 10 ...
Benchmarking take with input 100 ...
Benchmarking take with input 1000 ...

##### With input 1 #####
Name                        ips        average  deviation         median         99th %
take                    11.95 M       83.65 ns ±28285.23%          83 ns          84 ns
fetch (Enum.map)         9.79 M      102.19 ns ±19532.66%          83 ns         125 ns
get (Enum.map)           9.50 M      105.31 ns ±20042.94%          83 ns         125 ns
fetch! (Enum.map)        9.32 M      107.27 ns ±20770.71%          83 ns         125 ns
get (for)                8.70 M      114.93 ns ±20371.78%          83 ns         125 ns
fetch (for)              8.69 M      115.08 ns ±22813.85%          83 ns         125 ns
fetch! (for)             8.46 M      118.25 ns ±17192.86%          83 ns         125 ns

Comparison: 
take                    11.95 M
fetch (Enum.map)         9.79 M - 1.22x slower +18.53 ns
get (Enum.map)           9.50 M - 1.26x slower +21.66 ns
fetch! (Enum.map)        9.32 M - 1.28x slower +23.62 ns
get (for)                8.70 M - 1.37x slower +31.27 ns
fetch (for)              8.69 M - 1.38x slower +31.43 ns
fetch! (for)             8.46 M - 1.41x slower +34.60 ns

##### With input 10 #####
Name                        ips        average  deviation         median         99th %
get (Enum.map)           3.94 M      253.52 ns ±11001.59%         208 ns         375 ns
get (for)                3.66 M      273.00 ns  ±9423.42%         209 ns         334 ns
take                     3.43 M      291.29 ns  ±7903.38%         250 ns         292 ns
fetch! (for)             3.43 M      291.61 ns  ±7048.02%         250 ns         375 ns
fetch! (Enum.map)        3.41 M      293.46 ns  ±8004.15%         250 ns         375 ns
fetch (for)              3.28 M      305.00 ns  ±6959.35%         250 ns         416 ns
fetch (Enum.map)         3.02 M      330.82 ns  ±6767.28%         250 ns         375 ns

Comparison: 
get (Enum.map)           3.94 M
get (for)                3.66 M - 1.08x slower +19.49 ns
take                     3.43 M - 1.15x slower +37.77 ns
fetch! (for)             3.43 M - 1.15x slower +38.09 ns
fetch! (Enum.map)        3.41 M - 1.16x slower +39.94 ns
fetch (for)              3.28 M - 1.20x slower +51.48 ns
fetch (Enum.map)         3.02 M - 1.30x slower +77.30 ns

##### With input 100 #####
Name                        ips        average  deviation         median         99th %
get (Enum.map)         428.81 K        2.33 μs   ±605.78%        2.17 μs        2.67 μs
get (for)              426.99 K        2.34 μs   ±886.27%        2.08 μs        3.88 μs
fetch! (for)           401.62 K        2.49 μs   ±515.68%        2.33 μs        3.21 μs
fetch (for)            351.80 K        2.84 μs   ±676.10%        2.54 μs        5.75 μs
fetch! (Enum.map)      324.53 K        3.08 μs   ±663.85%        2.83 μs        3.75 μs
fetch (Enum.map)       307.58 K        3.25 μs   ±734.78%        2.88 μs        6.21 μs
take                   165.53 K        6.04 μs   ±471.91%        5.50 μs       10.96 μs

Comparison: 
get (Enum.map)         428.81 K
get (for)              426.99 K - 1.00x slower +0.00998 μs
fetch! (for)           401.62 K - 1.07x slower +0.158 μs
fetch (for)            351.80 K - 1.22x slower +0.51 μs
fetch! (Enum.map)      324.53 K - 1.32x slower +0.75 μs
fetch (Enum.map)       307.58 K - 1.39x slower +0.92 μs
take                   165.53 K - 2.59x slower +3.71 μs

##### With input 1000 #####
Name                        ips        average  deviation         median         99th %
get (for)               37.52 K       26.65 μs   ±204.39%       24.79 μs       46.61 μs
fetch! (for)            35.14 K       28.46 μs    ±44.71%       27.42 μs       42.33 μs
get (Enum.map)          34.12 K       29.31 μs    ±15.19%       28.29 μs       44.33 μs
fetch (for)             31.62 K       31.62 μs     ±9.41%       30.83 μs       42.04 μs
fetch! (Enum.map)       29.65 K       33.72 μs    ±10.58%       32.88 μs       46.62 μs
fetch (Enum.map)        27.34 K       36.58 μs    ±18.96%       35.46 μs       48.79 μs
take                    10.71 K       93.41 μs    ±13.42%       91.17 μs      112.35 μs

Comparison: 
get (for)               37.52 K
fetch! (for)            35.14 K - 1.07x slower +1.81 μs
get (Enum.map)          34.12 K - 1.10x slower +2.66 μs
fetch (for)             31.62 K - 1.19x slower +4.97 μs
fetch! (Enum.map)       29.65 K - 1.27x slower +7.07 μs
fetch (Enum.map)        27.34 K - 1.37x slower +9.93 μs
take                    10.71 K - 3.50x slower +66.76 μs

michallepicki · November 1, 2023, 9:51pm

You should make sure to return the same values from all your benchmarks. I think Map.take will return a map, while in other benchmarks you get a list. for has an into option that you could use

(It is a separate concern if you don’t need a map and you’re satisfied with a list, which would be a valid reason to use something different than Map.take)

dimitarvp · November 1, 2023, 10:48pm

@michallepicki beat me by a few hours but – either return the same kind of value from each function or just discard them by explicitly returning nil. That should give you a bit more objective data to work with.

BradS2S · November 1, 2023, 11:43pm

Here’s what I got comparing apples to apples:

Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 64 GB
Elixir 1.15.7
Erlang 26.0.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1000
Estimated total run time: 28 s

Benchmarking Enum.reduce with fetch with input 1000 ...
Benchmarking Map.new with fetch! with input 1000 ...
Benchmarking Map.new with get and pipe with input 1000 ...
Benchmarking Map.take with input 1000 ...

##### With input 1000 #####
Name                                ips        average  deviation         median         99th %
Map.take                         8.07 K      123.88 μs    ±15.30%      120.55 μs      219.84 μs
Map.new with fetch!              7.15 K      139.92 μs    ±21.31%      131.21 μs      280.76 μs
Enum.reduce with fetch           7.15 K      139.93 μs    ±21.37%      132.06 μs      279.94 μs
Map.new with get and pipe        6.66 K      150.06 μs    ±15.69%      145.22 μs      271.62 μs

Comparison: 
Map.take                         8.07 K
Map.new with fetch!              7.15 K - 1.13x slower +16.04 μs
Enum.reduce with fetch           7.15 K - 1.13x slower +16.05 μs
Map.new with get and pipe        6.66 K - 1.21x slower +26.18 μs

Yegair · November 2, 2023, 6:21am

Thank you! And of course, that makes sense! I knew I must be missing some detail

However, within the code I was profiling the values of the map are what I need. So what I am actually doing is something like this:

map
|> Map.take(keys)
|> Map.values()

Thus I conclude, doing something like this will probably be faster in my case:

for key <- keys, do: Map.get(map, key)

dimitarvp · November 2, 2023, 12:07pm

Do you know which keys you will need beforehand?

BradS2S · November 2, 2023, 1:54pm

I used your code with the exception of where it returned a list instead of a map.

Yegair · November 2, 2023, 5:07pm

Not sure if I understand your question correctly.

I do not know the keys explicitly, nor do I know how many keys I need to look up, so I think pattern matching will not be an option if that is what you are aiming at.

Very much simplified, the code looks like this:

def get_items_by_filters(%{} = items_by_id, index, query) do
  matching_ids = find_matching_ids(index, query)

  items_by_id
  |> Map.take(matching_ids)
  |> Map.values()
end

dimitarvp · November 2, 2023, 5:44pm

Yep I was.

Why though? It is visible from you and another poster that Map.take is the fastest.

Yegair · November 3, 2023, 8:17am

Hm, maybe I’m still missing some important point, but my reasoning goes like this:

Map.take is fastest, if what I need is a map. However, what I actually need are the values of the map, so I have to additionally call Map.values (or something similar) afterwards. No matter how fast that is, it will add some time on top of what Map.take needs. Hence, using Map.get or Map.fetch! repeatedly seems to make more sense in my case, because it directly gives me a list of the values (I don’t need the map that Map.take would give me).

To confirm my assumption I ran another mini benchmark, that now always returns a list of the values, which more closely resembles the case I am profiling within the actual code:

map = Map.new(1..10_000, &{&1, &1})
all_keys = Map.keys(map)

Benchee.run(
  %{
    "Map.fetch!" => fn keys -> for key <- keys, do: Map.fetch!(map, key) end,
    "Map.get" => fn keys -> for key <- keys, do: Map.get(map, key) end,
    "Map.take |> Map.values" => fn keys -> map |> Map.take(keys) |> Map.values() end
  },
  inputs: %{
    "1000 keys" => Enum.take_random(all_keys, 1000),
  }
)

The results look like this:

Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.14.5
Erlang 25.3.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1000 keys
Estimated total run time: 21 s

Benchmarking Map.fetch! with input 1000 keys ...
Benchmarking Map.get with input 1000 keys ...
Benchmarking Map.take |> Map.values with input 1000 keys ...

##### With input 1000 keys #####
Name                             ips        average  deviation         median         99th %
Map.get                      38.49 K       25.98 μs    ±82.32%       24.67 μs       43.58 μs
Map.fetch!                   35.30 K       28.33 μs    ±36.74%       27.33 μs       41.38 μs
Map.take |> Map.values        9.76 K      102.45 μs     ±5.90%      100.25 μs      120.56 μs

Comparison: 
Map.get                      38.49 K
Map.fetch!                   35.30 K - 1.09x slower +2.35 μs
Map.take |> Map.values        9.76 K - 3.94x slower +76.47 μs

dimitarvp · November 3, 2023, 12:49pm

Quite interesting and yes, for 1000 values I’d think the same.