Benchmarking dot in map.field

The benchmark

Mix.install [:benchee]

defmodule X do

  def dot(map) do
    [
      map.x + map.y,
      map.y + map.x,
      map.x + map.y,
      map.y + map.x,
      map.x + map.y,
      map.y + map.x,
    ]
  end

  def annotated(%{} = map) do
    [
      map.x + map.y,
      map.y + map.x,
      map.x + map.y,
      map.y + map.x,
      map.x + map.y,
      map.y + map.x,
    ]
  end

  def match(%{x: x, y: y}) do
    [
      x + y,
      y + x,
      x + y,
      y + x,
      x + y,
      y + x,
    ]
  end

  defmacrop dot(map, key) do
    quote do
      :erlang.map_get(unquote(key), unquote(map))
    end
  end

  def safe_dot(map) do
    [
      dot(map, :x) + dot(map, :y),
      dot(map, :y) + dot(map, :x),
      dot(map, :x) + dot(map, :y),
      dot(map, :y) + dot(map, :x),
      dot(map, :x) + dot(map, :y),
      dot(map, :y) + dot(map, :x),
    ]
  end

  defmacrop match(map, field) do
    quote do
      %{unquote(field) => value} = unquote(map)
      value
    end
  end

  def match_every_time(map) do
    [
      match(map, :x) + match(map, :y),
      match(map, :y) + match(map, :x),
      match(map, :x) + match(map, :y),
      match(map, :y) + match(map, :x),
      match(map, :x) + match(map, :y),
      match(map, :y) + match(map, :x),
    ]
  end

end

Benchee.run(%{
  "dot" => fn -> X.dot(%{x: 1, y: 1}) end,
  "annotated" => fn -> X.annotated(%{x: 1, y: 1}) end,
  "erlang.map_get" => fn -> X.safe_dot(%{x: 1, y: 1}) end,
  "match" => fn -> X.match(%{x: 1, y: 1}) end,
  "match_every_time" => fn -> X.match_every_time(%{x: 1, y: 1}) end,
}, warmup: 1, time: 2)

And the result is:

Name                       ips        average  deviation         median         99th %
match                   7.46 M      133.99 ns ±22658.59%          90 ns         184 ns
match_every_time        7.33 M      136.44 ns ±23751.72%          93 ns         181 ns
annotated               7.22 M      138.59 ns ±23522.99%          92 ns         196 ns
erlang.map_get          6.67 M      149.82 ns ±21303.83%         105 ns         212 ns
dot                     4.24 M      235.62 ns ±16830.96%         137 ns         266 ns

Comparison:
match                   7.46 M
match_every_time        7.33 M - 1.02x slower +2.45 ns
annotated               7.22 M - 1.03x slower +4.60 ns
erlang.map_get          6.67 M - 1.12x slower +15.83 ns
dot                     4.24 M - 1.76x slower +101.63 ns

Explanation

First of all, construction like map.field get’s compiled into this by elixir compiler. This is a feature of Elixir runtime and it is caused by feature of calling functions without (). However, code map.field() gets compiled to the same expression.

case map do
  module when is_atom(module) -> apply(module, :field, [])
  %{field: value} -> value
  _ -> raise BadMapError
end
  • match generates the fastest core_erlang and beam for this problem. Rule of thumb: pattern matching is always the fastest way to access the data in collections in both Erlang and Elixir.
  • annotated generates almost the same core_erlang as dot version, but compiler sees that the map variable in the function body will always be a map (otherwise it wouldn’t pass the matching in args) and it optimizes the case generated by dot to just a pattern matching. So it generates the same BEAM as match version
  • erlang.map_get is a very unpopular way to access the key of a map, and it is the third way to do so, added in one of the latest OTP version. It is actually a separate BIF and it is slower because it performs a map type check on every call
  • dot is the slowest one, because it performs a type check and a jump on every key access

Conclusions

  1. Annotate your maps to have better performance of ., or at least try to express as much code in pattern matching as possible
  2. Rewrite nested access like map.submap.subsubmap.field into pattern-matching, or at least use Pathex.
  3. Use Tria optimizing compiler which performs analysis to simplify the case generated by ., thus extracting the best performance for the . expression.
12 Likes

You are extracting the values only once here.

1 Like

Indeed the 2x performance penalty is just too much, it doesn’t make any sense.

No, that’s not the reason. The evaluation of extracting the field is pure, so compiler performs optimization in every case (except dot) that extract the data only once. However, some of them perform additional checks upon every call. This is what this benchmark shows.

I’ve updated the benchmark to reflect in-place matching.

cc @D4no0

Interesting, thank you.

The compiler could optimize that as well though. Are you certain it doesn’t ?

I wonder why in my first benchmarck the match head was consistently slower then.

inputs = %{
  "large" => Map.new(1..33, fn n -> {:"key_#{n}", []} end),
  "single" => %{key_1: []}
}

Benchee.run(
  %{
    "body_match" => fn map ->
      %{key_1: value} = map
      value
    end,
    "case" => fn map ->
      case map do
        %{key_1: value} -> value
      end
    end,
    "dot" => fn map ->
      map.key_1
    end,
    "head_match" => fn
      %{key_1: value} ->
        value
    end
  },
  inputs: inputs,
  pre_check: true,
  time: 1
)

If you specify functions not in the body of a module, they’re interpreted as erl_eval structures. And this is not suitable for benchmarking

Ok but in my second benchmark they were defined with def in a module.

@hst337 Super interesting, thank you for this.

Does Tria’s . implementation sacrifice any features of Elixir’s? IE, could Elixir not also make this optimization? Perhaps you’ve talked about this before in the forums and I missed it (I generally follow those threads silently as they are a bit out of my wheelhouse).

Overall this doesn’t dissuade me from using . when I feel it reads better, but I will think now twice in situations where I favour . in a readability coin toss.

I think this is an anti-pattern, you should use whatever makes the code readable and maintainable and not dwelve into these low-stake optimizations, as they are negligible in the big picture. Hardware becomes more powerful and cheaper by the minute and your time not.

If the optimization was not already implemented in elixir, means that there is some reason for that or it will be implemented in the future.

I agree 100%, I just meant I will use this knowledge to now favour pattern matching in situations where I could go either way. I don’t generally favour one over the other, it’s always situational. Every once in a while I find myself not caring which I use, so why not use performance as a tie-breaker, no matter how small it may be.

I figured, but I’m still interested to know what it is!

2 Likes

Nah, it is not going to be implemented in the future, because it is just too much work for Elixir team. I’ve implemented it, but my implementation has insignificant limitations like slightly slower runtime recompilation. My ideas won’t become a part of the language any time soon. Core team of Elixir is interested in static typing, while core team of Erlang is interested in their own customers, who are in this 0.1% of users, who actually use hot-reloading in production

There was no reason in introducing this kind of syntax in the first place. Calls without parentheses could be implemented without this kind of performance drawback, like constructs where the left argument of . is not a compile-time atom and there are no parentheses or args, could always be treated as :erlang.map_get.

So this is a language design issue, and, to be honest, it is not very much of an important issue. I wrote a compiler which is able to overcome it, and simple annotation restores the performance back to 100%. Other languages have UBs, untrackable exceptions and forkbombs in 7 chars, so creators of Elixir did a good job there.

No, Moore’s law stoped working a while ago, and maximum number of cores in one CPU, and maximum number of CPU’s in one motherboard have their limits. We either optimize our code, or hope for the best in terms of quantum computing

3 Likes

My opinion is the following on this: you don’t like it, you don’t use it, simple as that.

Good for you, the last time there was a discussion on this topic, it was clear that you are potentially breaking internal features of VM with your opinionated optimizations.

Yeah that’s what they were also saying about chips not being able to go smaller than 10 nm about 10 years ago, and here we are at 4 nm and this is just the begging, we will find another way to advance, because this is what humans do.

It is also clear that you are from the academic world, there is a big distinction between making software that makes money and writing “optimized” software that saves a few hundred bucks on hosting but costs a lot more to be maintained and developed. I came to elixir because I care about delivering results, while having a battle-tested fault-tolerant VM and a great language, not because I care that dot syntax has a lag of 100ns.

I am not, and I’ve been in opposition to academia for my whole life.

I’d like to discuss it, since I am unaware about features my compiler is breaking. One of my goals from the beginning was 100% compatibility with existing behaviour of runtime, compiler and how people expect language to behave. Even these hard problems as runtime recompilation are under my aim in the moment, and they’re supported, but not completely since it is work in progress. If I am missing something, we can discuss it here, or in a compiler’s thread.

While it can be without this lag. And this is a problem. And it can be fixed. And it is fixed in my compiler. Performance optimizations in Elixir is one of thing I’ve decided to work on, this is not a holywar, and I am not attacking anybody with my posts, I am just sharing my experience about problems and solutions to them.

If you want to argue for the sake of it, without any particular reason, then you can, but please, do not involve me into this.

3 Likes

Thanks for the benchmark and for your engagement in making Elixir faster @hst337. Keep it up :raised_hands:

I bumped into the considered issue a while ago, when doing flame graph analysis which showed that get_in & friends are taking a lot of time. Replacing them with macro-generated pattern matching reduced the CPU usage by ~10% in our use case. A few similar hot path optimisations gave another 10%.

In some cases these optimisations give real speed up and having them automated would be invaluable.

8 Likes

This is exactly what my other project called Pathex does

2 Likes

This is what I kind of understood from the posts on this forum, but once again I never went into much detail about the specifics since this is not interesting for me, I think there was something about runtime modules?

This is great that you want to go the 100% compatible way, and it will be amazing if we got at least a part of performance boost that these privileged languages get nowadays.

This is not for the sake of it, the idea is that you shouldn’t be stopped from using a feature, because currently is inefficient.

This is as you said a compiler implementation problem, and once fixed, everybody will be happy. My thinking is that a developer using a high-level language should not care about internals like these, because they are prone to change (and 100% he will not keep track of these changes and then mislead others on what to use and when) and because it fills the head with unnecessary information, instead of understanding how to write maintainable and readable code.

1 Like