Guards vs. generated functions: which is faster/more performant?

mruoss · November 22, 2023, 11:11am

Hey there

Regarding performance, is there a difference between guards and generated functions using pattern matching? Or does the compiler generate the same byte code for both?

And if there is a difference, which one is more performant?

Using Guards

defmodule UsingGuards do
  def do_work(arg) when arg in big_list do
    # ...
  end
end

Generated Functions

defmodule UsingFunctionGeneration do
  for arg in big_list do
    def do_work(unquote(arg)) do
      # ...
    end
  end
end

LostKobrakai · November 22, 2023, 11:24am

You can use GitHub - michalmuskala/decompile to take a look how things will look at lower levels.

mruoss · November 22, 2023, 11:45am

I see, thanks for this. What I found out: It does not compile to the same byte code.

olafura · November 22, 2023, 12:41pm

The difference between these two approaches is the difference between a big case statement and a Enum.member? call. It would be cool to see a test with benchee | Hex

My gut feeling is the generated function is faster.

LostKobrakai · November 22, 2023, 12:51pm

in/2 doesn’t compile to an Enum.member?/2 call when used in a guard:

https://hexdocs.pm/elixir/Kernel.html#in/2-guards

olafura · November 22, 2023, 12:57pm

Nice, should have looked at the documentation

zachallaun · November 22, 2023, 4:24pm

As mentioned above, it would be easy enough to set up a quick benchmark! I’d be interested in the result.

My hypothesis is that performance will be, for all intents and purposes, equivalent. I don’t believe multiple function clause pattern matching does anything fancy that allows it to skip a significant number of clauses here, so both cases involve a linear search.

The first version, using guards, is certainly more idiomatic.

mruoss · November 22, 2023, 4:37pm

Will do.

mruoss · November 22, 2023, 10:53pm

So I wrote the following simple Elixir script that uses benchee. I did not give it much thought to be honest. Let me know what you think.

TLDR; Performance for both approaches indeed seem to be equivalent.

Script

defmodule Test do
  @numbers Range.to_list(1..1000)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for number <- @numbers do
    def do_work_generated(unquote(number)), do: unquote(number)
  end

  def do_work_guards(number) when number in @numbers, do: number
end

Mix.install([:benchee, :benchee_markdown])
list_of_random_numbers = fn -> Enum.map(1..500_000, fn _ -> :rand.uniform(1000) end) end

inputs =
  list_of_random_numbers
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {numbers, idx} -> {"Batch #{idx + 1}", numbers} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end
  },
  memory_time: 60,
  inputs: inputs,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK.md"},
    Benchee.Formatters.Console
  ]
)

Output

Configuration

Benchmark suite executing with the following configuration:

:time	5 s
:parallel	1
:warmup	2 s

Statistics

Input: Batch 1

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	177.07	5.65 ms	±4.15%	5.62 ms	5.87 ms
Generated Functions	175.18	5.71 ms	±4.30%	5.67 ms	6.30 ms

Run Time Comparison

Name	IPS	Slower
Guards	177.07
Generated Functions	175.18	1.01x

Memory Usage

Name	Average	Factor
Guards	7.63 MB
Generated Functions	7.63 MB	1.0x

Input: Batch 2

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	174.66	5.73 ms	±4.56%	5.69 ms	6.14 ms
Generated Functions	174.12	5.74 ms	±4.46%	5.71 ms	6.19 ms

Run Time Comparison

Name	IPS	Slower
Guards	174.66
Generated Functions	174.12	1.0x

Memory Usage

Name	Average	Factor
Guards	7.63 MB
Generated Functions	7.63 MB	1.0x

Input: Batch 3

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	174.93	5.72 ms	±4.77%	5.68 ms	6.23 ms
Generated Functions	174.21	5.74 ms	±4.07%	5.71 ms	5.96 ms

Run Time Comparison

Name	IPS	Slower
Guards	174.93
Generated Functions	174.21	1.0x

Memory Usage

Name	Average	Factor
Guards	7.63 MB
Generated Functions	7.63 MB	1.0x

Input: Batch 4

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	174.18	5.74 ms	±4.48%	5.71 ms	6.16 ms
Generated Functions	172.05	5.81 ms	±4.21%	5.79 ms	6.14 ms

Run Time Comparison

Name	IPS	Slower
Guards	174.18
Generated Functions	172.05	1.01x

Memory Usage

Name	Average	Factor
Guards	7.63 MB
Generated Functions	7.63 MB	1.0x

Input: Batch 5

Run Time

Name	IPS	Average	Devitation	Median	99th %
Generated Functions	174.62	5.73 ms	±4.37%	5.69 ms	6.19 ms
Guards	174.34	5.74 ms	±4.52%	5.71 ms	6.10 ms

Run Time Comparison

Name	IPS	Slower
Generated Functions	174.62
Guards	174.34	1.0x

Memory Usage

Name	Average	Factor
Generated Functions	7.63 MB
Guards	7.63 MB	1.0x

arcyfelix · November 22, 2023, 11:32pm

Looks like the performance, for both options, is pretty much the same.

al2o3cr · November 23, 2023, 2:48am

I extended this into a couple more cases that got unexpected results:

defmodule Test do
  @numbers Enum.to_list(1..1000)

  @numbers_as_map Enum.to_list(1..1000) |> Map.new(fn el -> {el, true} end)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for number <- @numbers do
    def do_work_generated(unquote(number)), do: unquote(number)
  end

  def do_work_guards(number) when number in @numbers, do: number

  def do_work_map(number) do
    if Map.has_key?(@numbers_as_map, number) do
      number
    else
      raise "oh no"
    end
  end

  def do_work_in(number) do
    if number in @numbers do
      number
    else
      raise "oh no"
    end
  end

  def do_work_nothing(number) do
    number
  end
end

Mix.install([:benchee, :benchee_markdown])
list_of_random_numbers = fn -> Enum.map(1..500_000, fn _ -> :rand.uniform(1000) end) end

inputs =
  list_of_random_numbers
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {numbers, idx} -> {"Batch #{idx + 1}", numbers} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end,
    "Map" => fn input -> Test.work(input, &Test.do_work_map/1) end,
    "In" => fn input -> Test.work(input, &Test.do_work_in/1) end,
    "Nothing" => fn input -> Test.work(input, &Test.do_work_nothing/1) end,
  },
  memory_time: 60,
  inputs: inputs,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK.md"},
    Benchee.Formatters.Console
  ]
)

The important bits:

do_work_map uses Map.has_key? to check for membership
do_work_in uses in in an if (instead of a guard) to check for membership
do work_nothing doesn’t bother checking at all

My absolute numbers aren’t relevant (my machine is pretty slow, and I’m on 1.13/OTP24), but the differences are peculiar:

do_work_map is roughly 2x slower than either the guard or the generated cases. That’s surprising, given that map lookup should take far fewer comparisons than simple linear search.
even weirder: do_work_nothing doesn’t always WIN somehow. It doesn’t ever lose by much, but surely whatever logic is dispatching the function heads or guards should take some time
do_work_in is MASSIVELY slower, by a factor of 30x.

I’ve tried a few obvious things that seemed like they could be causing slowdown - for instance, not having a raise in the body of do_work_map to see if the optimizer is giving up because an exception might be raised - but none of them have yielded a change.

mruoss · November 23, 2023, 8:03am

Yeah it looks like the compiler is able to optimize better if facts are known at compile time.

But here’s something interesting: I have adapted the code a bit because eventually, I’m gonna be working with strings. And if we add binary pattern matching to the mix, performance seems to be different.

Side note: Reading the documentation of Kernel.in/2 I read the following:

However, this construct will be inefficient for large lists. In such cases, it is best to stop using guards and use a more appropriate data structure, such as MapSet.

I therefore added a test using MapSets (which, afaict, is equivalent to @al2o3cr’s test using is_map_key).

TLDR;

In this case, guards seem to make the race, being twice as fast as generated functions and 2.5x faster than MapSets

Test

defmodule Test do
  @chars Range.to_list(1000..1)
  @chars_map_set MapSet.new(1000..1)

  def work(input, fun) do
    Enum.map(input, fun)
  end

  for char <- @chars do
    def do_work_generated(<<unquote(char)::utf8>>), do: unquote(char)

    def do_work_generated(<<unquote(char)::utf8, rest::binary>>) do
      do_work_generated(rest)
    end
  end

  def do_work_guards(<<char::utf8>>) when char in @chars, do: char

  def do_work_guards(<<char::utf8, rest::binary>>) when char in @chars do
    do_work_guards(rest)
  end

  def do_work_mapset(<<char::utf8>>) do
    if MapSet.member?(@chars_map_set, char) do
      char
    else
      :error
    end
  end

  def do_work_mapset(<<char::utf8, rest::binary>>) do
    if MapSet.member?(@chars_map_set, char) do
      do_work_mapset(rest)
    else
      :error
    end
  end
end

Mix.install([:benchee, :benchee_markdown])

list_of_random_strings = fn ->
  Enum.map(1..500_000, fn _ ->
    for _ <- 1..10,
        into: "",
        do: <<Enum.random(Range.to_list(?0..?9) ++ Range.to_list(?A..?z))>>
  end)
end

inputs =
  list_of_random_strings
  |> Stream.repeatedly()
  |> Stream.take(5)
  |> Stream.with_index()
  |> Stream.map(fn {strings, idx} -> {"Batch #{idx + 1}", strings} end)

Benchee.run(
  %{
    "Generated Functions" => fn input -> Test.work(input, &Test.do_work_generated/1) end,
    "Guards" => fn input -> Test.work(input, &Test.do_work_guards/1) end,
    "MapSet" => fn input -> Test.work(input, &Test.do_work_mapset/1) end
  },
  memory_time: 60,
  inputs: inputs,
  parallel: 10,
  formatters: [
    {Benchee.Formatters.Markdown, file: "BENCHMARK_STRINGS.md"},
    Benchee.Formatters.Console
  ]
)

Output

Statistics

Input: Batch 1

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	31.07	32.19 ms	±8.67%	32.09 ms	46.84 ms
Generated Functions	15.54	64.36 ms	±5.44%	63.96 ms	90.67 ms
MapSet	10.99	91.02 ms	±5.33%	90.73 ms	123.52 ms

Run Time Comparison

Name	IPS	Slower
Guards	31.07
Generated Functions	15.54	2.0x
MapSet	10.99	2.83x

Memory Usage

Name	Average	Factor
Guards	26.70 MB
Generated Functions	26.70 MB	1.0x
MapSet	26.70 MB	1.0x

Input: Batch 2

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	31.57	31.68 ms	±8.91%	31.97 ms	45.93 ms
Generated Functions	15.64	63.94 ms	±5.55%	64.00 ms	91.14 ms
MapSet	11.10	90.06 ms	±4.90%	90.00 ms	119.82 ms

Run Time Comparison

Name	IPS	Slower
Guards	31.57
Generated Functions	15.64	2.02x
MapSet	11.10	2.84x

Memory Usage

Name	Average	Factor
Guards	26.70 MB
Generated Functions	26.70 MB	1.0x
MapSet	26.70 MB	1.0x

Input: Batch 3

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	31.43	31.81 ms	±8.78%	31.89 ms	46.68 ms
Generated Functions	15.40	64.95 ms	±5.93%	64.84 ms	93.51 ms
MapSet	11.04	90.54 ms	±4.83%	90.49 ms	118.91 ms

Run Time Comparison

Name	IPS	Slower
Guards	31.43
Generated Functions	15.40	2.04x
MapSet	11.04	2.85x

Memory Usage

Name	Average	Factor
Guards	26.70 MB
Generated Functions	26.70 MB	1.0x
MapSet	26.70 MB	1.0x

Input: Batch 4

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	31.08	32.18 ms	±8.48%	32.34 ms	46.62 ms
Generated Functions	15.25	65.58 ms	±6.42%	65.17 ms	97.31 ms
MapSet	12.31	81.22 ms	±7.78%	79.51 ms	108.75 ms

Run Time Comparison

Name	IPS	Slower
Guards	31.08
Generated Functions	15.25	2.04x
MapSet	12.31	2.52x

Memory Usage

Name	Average	Factor
Guards	26.70 MB
Generated Functions	26.70 MB	1.0x
MapSet	26.70 MB	1.0x

Input: Batch 5

Run Time

Name	IPS	Average	Devitation	Median	99th %
Guards	29.61	33.78 ms	±8.29%	33.12 ms	50.84 ms
Generated Functions	15.13	66.08 ms	±5.99%	65.88 ms	97.77 ms
MapSet	11.20	89.32 ms	±53.75%	82.30 ms	440.40 ms

Run Time Comparison

Name	IPS	Slower
Guards	29.61
Generated Functions	15.13	1.96x
MapSet	11.20	2.64x

Memory Usage

Name	Average	Factor
Guards	26.70 MB
Generated Functions	26.70 MB	1.0x
MapSet	26.70 MB	1.0x

hst337 · November 23, 2023, 8:57pm

This exact test case has nothing to do for “guards vs generated functions”

when x in 1..100 gets compiled into when x > 1 and x < 100, so it is just faster.

So, to answer the original question “Performance Question: guards vs. generated functions: which is faster?”: none is faster, both are a different tools used for different tasks. The task you’ve brought up in the original post is made up and is actually solved not by x in y guard or generated clauses, but just by x >= left and x <= right