Guards vs. generated functions: which is faster/more performant?

Hey there

Regarding performance, is there a difference between guards and generated functions using pattern matching? Or does the compiler generate the same byte code for both?

And if there is a difference, which one is more performant?

Using Guards

defmodule UsingGuards do
  def do_work(arg) when arg in big_list do
    # ...
  end
end

Generated Functions

defmodule UsingFunctionGeneration do
  for arg in big_list do
    def do_work(unquote(arg)) do
      # ...
    end
  end
end
1 Like

You can use GitHub - michalmuskala/decompile to take a look how things will look at lower levels.

5 Likes

I see, thanks for this. What I found out: It does not compile to the same byte code. :slight_smile:

The difference between these two approaches is the difference between a big case statement and a Enum.member? call. It would be cool to see a test with benchee | Hex

My gut feeling is the generated function is faster.

in/2 doesn’t compile to an Enum.member?/2 call when used in a guard:

https://hexdocs.pm/elixir/Kernel.html#in/2-guards

7 Likes

Nice, should have looked at the documentation :smile:

As mentioned above, it would be easy enough to set up a quick benchmark! I’d be interested in the result.

My hypothesis is that performance will be, for all intents and purposes, equivalent. I don’t believe multiple function clause pattern matching does anything fancy that allows it to skip a significant number of clauses here, so both cases involve a linear search.

The first version, using guards, is certainly more idiomatic.

Will do.

So I wrote the following simple Elixir script that uses benchee. I did not give it much thought to be honest. Let me know what you think.

TLDR; Performance for both approaches indeed seem to be equivalent.

Script

defmodule Test do
  @numbers Range.to_list(1..1000)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for number <- @numbers do
    def do_work_generated(unquote(number)), do: unquote(number)
  end

  def do_work_guards(number) when number in @numbers, do: number
end

Mix.install([:benchee, :benchee_markdown])
list_of_random_numbers = fn -> Enum.map(1..500_000, fn _ -> :rand.uniform(1000) end) end

inputs =
  list_of_random_numbers
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {numbers, idx} -> {"Batch #{idx + 1}", numbers} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end
  },
  memory_time: 60,
  inputs: inputs,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK.md"},
    Benchee.Formatters.Console
  ]
)

Output

Configuration

Benchmark suite executing with the following configuration:

:time 5 s
:parallel 1
:warmup 2 s

Statistics

Input: Batch 1

Run Time

Name IPS Average Devitation Median 99th %
Guards 177.07 5.65 ms ±4.15% 5.62 ms 5.87 ms
Generated Functions 175.18 5.71 ms ±4.30% 5.67 ms 6.30 ms

Run Time Comparison

Name IPS Slower
Guards 177.07  
Generated Functions 175.18 1.01x

Memory Usage

Name Average Factor
Guards 7.63 MB  
Generated Functions 7.63 MB 1.0x

Input: Batch 2

Run Time

Name IPS Average Devitation Median 99th %
Guards 174.66 5.73 ms ±4.56% 5.69 ms 6.14 ms
Generated Functions 174.12 5.74 ms ±4.46% 5.71 ms 6.19 ms

Run Time Comparison

Name IPS Slower
Guards 174.66  
Generated Functions 174.12 1.0x

Memory Usage

Name Average Factor
Guards 7.63 MB  
Generated Functions 7.63 MB 1.0x

Input: Batch 3

Run Time

Name IPS Average Devitation Median 99th %
Guards 174.93 5.72 ms ±4.77% 5.68 ms 6.23 ms
Generated Functions 174.21 5.74 ms ±4.07% 5.71 ms 5.96 ms

Run Time Comparison

Name IPS Slower
Guards 174.93  
Generated Functions 174.21 1.0x

Memory Usage

Name Average Factor
Guards 7.63 MB  
Generated Functions 7.63 MB 1.0x

Input: Batch 4

Run Time

Name IPS Average Devitation Median 99th %
Guards 174.18 5.74 ms ±4.48% 5.71 ms 6.16 ms
Generated Functions 172.05 5.81 ms ±4.21% 5.79 ms 6.14 ms

Run Time Comparison

Name IPS Slower
Guards 174.18  
Generated Functions 172.05 1.01x

Memory Usage

Name Average Factor
Guards 7.63 MB  
Generated Functions 7.63 MB 1.0x

Input: Batch 5

Run Time

Name IPS Average Devitation Median 99th %
Generated Functions 174.62 5.73 ms ±4.37% 5.69 ms 6.19 ms
Guards 174.34 5.74 ms ±4.52% 5.71 ms 6.10 ms

Run Time Comparison

Name IPS Slower
Generated Functions 174.62  
Guards 174.34 1.0x

Memory Usage

Name Average Factor
Generated Functions 7.63 MB  
Guards 7.63 MB 1.0x
8 Likes

Looks like the performance, for both options, is pretty much the same.

I extended this into a couple more cases that got unexpected results:

defmodule Test do
  @numbers Enum.to_list(1..1000)

  @numbers_as_map Enum.to_list(1..1000) |> Map.new(fn el -> {el, true} end)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for number <- @numbers do
    def do_work_generated(unquote(number)), do: unquote(number)
  end

  def do_work_guards(number) when number in @numbers, do: number

  def do_work_map(number) do
    if Map.has_key?(@numbers_as_map, number) do
      number
    else
      raise "oh no"
    end
  end

  def do_work_in(number) do
    if number in @numbers do
      number
    else
      raise "oh no"
    end
  end

  def do_work_nothing(number) do
    number
  end
end

Mix.install([:benchee, :benchee_markdown])
list_of_random_numbers = fn -> Enum.map(1..500_000, fn _ -> :rand.uniform(1000) end) end

inputs =
  list_of_random_numbers
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {numbers, idx} -> {"Batch #{idx + 1}", numbers} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end,
    "Map" => fn input -> Test.work(input, &Test.do_work_map/1) end,
    "In" => fn input -> Test.work(input, &Test.do_work_in/1) end,
    "Nothing" => fn input -> Test.work(input, &Test.do_work_nothing/1) end,
  },
  memory_time: 60,
  inputs: inputs,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK.md"},
    Benchee.Formatters.Console
  ]
)

The important bits:

  • do_work_map uses Map.has_key? to check for membership
  • do_work_in uses in in an if (instead of a guard) to check for membership
  • do work_nothing doesn’t bother checking at all

My absolute numbers aren’t relevant (my machine is pretty slow, and I’m on 1.13/OTP24), but the differences are peculiar:

  • do_work_map is roughly 2x slower than either the guard or the generated cases. That’s surprising, given that map lookup should take far fewer comparisons than simple linear search.
  • even weirder: do_work_nothing doesn’t always WIN somehow. It doesn’t ever lose by much, but surely whatever logic is dispatching the function heads or guards should take some time :thinking:
  • do_work_in is MASSIVELY slower, by a factor of 30x.

I’ve tried a few obvious things that seemed like they could be causing slowdown - for instance, not having a raise in the body of do_work_map to see if the optimizer is giving up because an exception might be raised - but none of them have yielded a change.

Yeah it looks like the compiler is able to optimize better if facts are known at compile time.

But here’s something interesting: I have adapted the code a bit because eventually, I’m gonna be working with strings. And if we add binary pattern matching to the mix, performance seems to be different.

Side note: Reading the documentation of Kernel.in/2 I read the following:

However, this construct will be inefficient for large lists. In such cases, it is best to stop using guards and use a more appropriate data structure, such as MapSet.

I therefore added a test using MapSets (which, afaict, is equivalent to @al2o3cr’s test using is_map_key).

TLDR;

In this case, guards seem to make the race, being twice as fast as generated functions and 2.5x faster than MapSets

Test

defmodule Test do
  @chars Range.to_list(1000..1)
  @chars_map_set MapSet.new(1000..1)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for char <- @chars do
    def do_work_generated(<<unquote(char)::utf8>>), do: unquote(char)

    def do_work_generated(<<unquote(char)::utf8, rest::binary>>) do
      do_work_generated(rest)
    end
  end

  def do_work_guards(<<char::utf8>>) when char in @chars, do: char

  def do_work_guards(<<char::utf8, rest::binary>>) when char in @chars do
    do_work_guards(rest)
  end

  def do_work_mapset(<<char::utf8>>) do
    if MapSet.member?(@chars_map_set, char) do
      char
    else
      :error
    end
  end

  def do_work_mapset(<<char::utf8, rest::binary>>) do
    if MapSet.member?(@chars_map_set, char) do
      do_work_mapset(rest)
    else
      :error
    end
  end
end

Mix.install([:benchee, :benchee_markdown])

list_of_random_strings = fn ->
  Enum.map(1..500_000, fn _ ->
    for _ <- 1..10,
        into: "",
        do: <<Enum.random(Range.to_list(?0..?9) ++ Range.to_list(?A..?z))>>
  end)
end

inputs =
  list_of_random_strings
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {strings, idx} -> {"Batch #{idx + 1}", strings} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end,
    "MapSet" => fn input -> Test.work(input, &Test.do_work_mapset/1) end
  },
  memory_time: 60,
  inputs: inputs,
  parallel: 10,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK_STRINGS.md"},
    Benchee.Formatters.Console
  ]
)

Output

Statistics

Statistics

Input: Batch 1

Run Time

Name IPS Average Devitation Median 99th %
Guards 31.07 32.19 ms ±8.67% 32.09 ms 46.84 ms
Generated Functions 15.54 64.36 ms ±5.44% 63.96 ms 90.67 ms
MapSet 10.99 91.02 ms ±5.33% 90.73 ms 123.52 ms

Run Time Comparison

Name IPS Slower
Guards 31.07  
Generated Functions 15.54 2.0x
MapSet 10.99 2.83x

Memory Usage

Name Average Factor
Guards 26.70 MB  
Generated Functions 26.70 MB 1.0x
MapSet 26.70 MB 1.0x

Input: Batch 2

Run Time

Name IPS Average Devitation Median 99th %
Guards 31.57 31.68 ms ±8.91% 31.97 ms 45.93 ms
Generated Functions 15.64 63.94 ms ±5.55% 64.00 ms 91.14 ms
MapSet 11.10 90.06 ms ±4.90% 90.00 ms 119.82 ms

Run Time Comparison

Name IPS Slower
Guards 31.57  
Generated Functions 15.64 2.02x
MapSet 11.10 2.84x

Memory Usage

Name Average Factor
Guards 26.70 MB  
Generated Functions 26.70 MB 1.0x
MapSet 26.70 MB 1.0x

Input: Batch 3

Run Time

Name IPS Average Devitation Median 99th %
Guards 31.43 31.81 ms ±8.78% 31.89 ms 46.68 ms
Generated Functions 15.40 64.95 ms ±5.93% 64.84 ms 93.51 ms
MapSet 11.04 90.54 ms ±4.83% 90.49 ms 118.91 ms

Run Time Comparison

Name IPS Slower
Guards 31.43  
Generated Functions 15.40 2.04x
MapSet 11.04 2.85x

Memory Usage

Name Average Factor
Guards 26.70 MB  
Generated Functions 26.70 MB 1.0x
MapSet 26.70 MB 1.0x

Input: Batch 4

Run Time

Name IPS Average Devitation Median 99th %
Guards 31.08 32.18 ms ±8.48% 32.34 ms 46.62 ms
Generated Functions 15.25 65.58 ms ±6.42% 65.17 ms 97.31 ms
MapSet 12.31 81.22 ms ±7.78% 79.51 ms 108.75 ms

Run Time Comparison

Name IPS Slower
Guards 31.08  
Generated Functions 15.25 2.04x
MapSet 12.31 2.52x

Memory Usage

Name Average Factor
Guards 26.70 MB  
Generated Functions 26.70 MB 1.0x
MapSet 26.70 MB 1.0x

Input: Batch 5

Run Time

Name IPS Average Devitation Median 99th %
Guards 29.61 33.78 ms ±8.29% 33.12 ms 50.84 ms
Generated Functions 15.13 66.08 ms ±5.99% 65.88 ms 97.77 ms
MapSet 11.20 89.32 ms ±53.75% 82.30 ms 440.40 ms

Run Time Comparison

Name IPS Slower
Guards 29.61  
Generated Functions 15.13 1.96x
MapSet 11.20 2.64x

Memory Usage

Name Average Factor
Guards 26.70 MB  
Generated Functions 26.70 MB 1.0x
MapSet 26.70 MB 1.0x
1 Like

This exact test case has nothing to do for “guards vs generated functions”

when x in 1..100 gets compiled into when x > 1 and x < 100, so it is just faster.

So, to answer the original question “Performance Question: guards vs. generated functions: which is faster?”: none is faster, both are a different tools used for different tasks. The task you’ve brought up in the original post is made up and is actually solved not by x in y guard or generated clauses, but just by x >= left and x <= right

2 Likes