Compiling files with many function heads is very slow(OTP 26 issue?)

An apple grown with less water than another appel is still an apple, right? Especially when they taste, look and weight the same.

If you use case statement, you need the guard only once. If you use function heads you need it multiple times.

De-optimizing the case-example does indeed make it ‘the same’ but I fail to see the purpose of watering an option down in benchmarks. It feels like adding weight to a car because “it’s faster due to being lighter”. But maybe I am missing the point entirely.

Even if at the end they boil down to the same thing (the compiler is smart enough)

If they boils down to the same thing, I would choose the one fastest to compile.

It might be me, but I fail to see the point why it is unfair (and like to be corrected)

Thanks for contributing an improved anon-variant. Gonna try later today.

In this case, they are different because in one you send 30k fewer AST nodes to the compiler, so there should be no surprises why one is much faster than the other and it should not be generalized as case being faster than def. For example, you could also have generated 10k defp without the guards and a single def that has a guard and calls it, and the result would be the same as case, but using heads. So the result fully boils down to how you write the code and not the constructs involved, it is not about case in particular being immune to guards.

Another way to put this is that the titles of your benchmarks should not be case vs heads, it is more like 1 guard vs 10000 guards. :slight_smile:

EDIT: furthermore, the only case where you can do the “guards optimization” above (unifying all guards into a single one) is when the guards themselves are redundant. This is because you are generating this code:

def foo(x = 1) when is_integer(x)
def foo(x = 2) when is_integer(x)

where you can extract the guards out, but arguably you should get rid of them altogether. They are only adding unnecessary compiler work.

Instead, when you are generating multiple clauses, it is often because the guards themselves are different, such as this:

def foo(x) when x === 1
def foo(x) when x === 2

And then you can’t eliminate them, neither in def nor in case. What could happen is that you can partially extract some guards, such as this:

def foo(x, y) when x === 1 and is_integer(y)
def foo(x, y) when x === 2 and is_integer(y)

Where you could move is_integer(y) out and have the remaining in a case (or a defp as established above). But I can’t see where you would be able to extract all guards out, as in your example.

4 Likes

Thanks for the clarification. Indeed the constructs themselves are not to blame, while my words indicate they do. I was trying to point to the names of the written scenarios rather than the constructs themselves; and failed to do so correctly.

3 Likes

Thats a neat example, I did a little test with this vs using guards, especially for unicode parsing. With SQL not only does it have to be valid but only specific sets are allowed, the case function compiles fast and also have good runtime performance roughly 2.5x

➜  sql git:(main) ✗ mix sql.bench
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.20.0-dev
Erlang 28.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 2 s
parallel: 1
inputs: 1..100_000
Estimated total run time: 22 s

Measured function call overhead as: 0 ns
Benchmarking case with input 1..100_000 ...
Benchmarking guard with input 1..100_000 ...
Calculating statistics...
Formatting results...

##### With input 1..100_000 #####
Name           ips        average  deviation         median         99th %
case        15.07 M       66.35 ns ±42238.21%          42 ns          42 ns
guard        6.14 M      162.76 ns ±15740.33%         125 ns         167 ns

Comparison:
case        15.07 M
guard        6.14 M - 2.45x slower +96.41 ns

Memory usage statistics:

Name    Memory usage
case             40 B
guard            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name Reduction count
case                2
guard               2 - 1.00x reduction count +0

I think it would be worth it to consider documenting these tricks akind to how we have anti patterns, as it feels like there is lots of ways to shoot yourself in the foot, and these simple neat examples could really shine.

3 Likes

Lol, spoke to soon, if your doing more complex things this approach quickly becomes extremely slow and requires alot of var!.

1 Like

Do you have an example of the code? Also wondering why you would need a lot of var!

I was trying to refactor this, to aviod the guard: sql/lib/lexer.ex at 911a0795d87666cbf77068d3a92b9fe3bdcf0a3e · elixir-dbvisor/sql · GitHub so far it’s causing infinite compilations.

The only way I found to make it work was creating a specific function but then all the optimizations was lost :frowning:, it was too good to be true.

The example you quoted from José does include guards. Just as many as the Heads-variant.

The ‘trick’ demonstrated by my Case-variant is the absence of 9.999 x 2 guards, as only a single function head has the 2 guards and all 9.999 case clauses can go without. As José mention, the same effect can be reached by having 1 pub function with guards delegating to 10.000 unguarded defp’s.

That being said: so far I could not reproduce any speed improvement with José’s Anon-variant in combination with Macro’s. Maybe the example is too simple for it to have enough impact to overcome an initial penalty..?

To be double clear: Yes, there is pattern matching on a specific integer AND a guards checking if the args are integers; the guards are useless and only added to test performance with guards.

defmodule AnonMacro do

  defmacro fast do
    quote bind_quoted: [] do
      for i <- 1..10000 do
        def foo(x = unquote(i), y = unquote(i)) when is_integer(x) and is_integer(y), do: unquote(i)
      end
    end
  end
end

defmodule Anon.A do
  import AnonMacro

  fast()
end


defmodule Anon.B do
  import AnonMacro

  fast()
end

defmodule Anon.C do
  import AnonMacro

  fast()
end

The result

test3.ex compiled in 9551.563 ms (guard heavy, missing optimization technique)
  -> Modules: [Heads.A, Heads.B, Heads.C]

test1.ex compiled in 10683.215 ms (guard heavy, wrapped in anon fun)
  -> Modules: [AnonMacro, Anon.A, Anon.B, Anon.C]

The Schultzer-case
I still wonder why you would need more var! :slight_smile: Can you give us an example of a rewritten block?

1 Like

I’ve done a lot of work in metaprogramming and also decided to rewrite def, so it uses case inside. Maybe you can take a look at my code for inspiration:

usage:

Let me know if you need some context on it. In general I have some DSL and to not add a hidden public functions I decided to move everything that matches this pattern to a Enumex.Generator.* helper modules.

The number of clauses depends on the number of enums defined in each module, the number of its possible values and the type of function. For example if you want to generate all variants of sort (for id, index and struct) then you have much more clauses. Think that someone wants to write enum with all HTTP status codes or so. This is why I quickly got into thousands of clauses and need to rewrite it into case.

Hope it would help as a good examples for further discussion. :heart:

2 Likes

This is the cleanest I could figure out, but no gains only slower compilation and runtime.

  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), rest::binary>> -> ident_mid(rest, [unquote(c)|var!(data)], var!(line), var!(column), var!(context), var!(length)+1, var!(acc))
        end
      )
    end ++ quote do
      rest -> ident_end(rest, var!(data), var!(line), var!(column), var!(context), var!(length), var!(acc))
    end
  defp ident_mid(rest, data, line, column, context, length, acc) do
    case rest do
      unquote(case_ast)
    end
  end

Thanks, took a quick look, does not look as complex as what I’m doing, getting good speed while working on binaries is hard especially when you want to parse unicode, but it just really weird how the compiler could figure out to create a jump table from:

defmodule Test do
  require Unicode.Set
  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), _::binary>> -> true
        end
      )
    end ++ quote do
      _ -> false
    end

  def case_fn(binary) do
    case binary do
      unquote(case_ast)
    end
  end

  def guard_fn(<<b::utf8, _::binary>>) when Unicode.Set.match?(b, "[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]"), do: true
  def guard_fn(_binary), do: false
end

But as soon as you throw in a function then it just capitulate.

1 Like

I was able to use José and @BartOtten example to remove most of the guards I had in my lexer sql/lib/lexer.ex at main · elixir-dbvisor/sql · GitHub

guards is another issue for compilation and runtime performance. Highly recommend this approach if you’re doing performance work, guards is a big no no in that matter.

As I see it correctly you mean the Case-variant, no?

@josevalim thanks for the hd trick for quoted cases :slight_smile: Had [unquote_splicing(clause_ast)] in the back of my head but that did not work in this case (so gave up an wrote pure AST). Using hd is tight. Barely an inconvenience.

1 Like

Yes, case without guards

I generally disagree with this conclusion but I am glad to be proven wrong if there are benchmarks proving so.

The issue in @BartOtten ‘s example was not the guards, simply the amount of code of generated. If you reduce the amount of code you generate, be it in the form of guards, function calls, data structures, it will likely speed up compiler and runtime. So there is a chance you are reducing the amount of guards, which leads to reduced lines of code, and that’s what is actually improving, it is not about the guards themselves.

Benchmarks are always tricky and it is very easy to take wrong conclusions.

1 Like

This is true. I only tried to demonstrate that (macro generating) many function heads with the same guards could be optimized by wrapping unguarded case clauses in a single guarded function head. It’s simply less code/AST, less to check.

The reason why I called the case-variant “immune to adding guards”, is because the amount of guards does not multiply in that example, where in the head-example it does multiply by the amount of heads. It has nothing to do with the guard construct itself.

‘Bloated’ AST function heads
In this example the struct is repeated over and over.

def foo(%Struct{foo: "bar"}), do: :bartender
def foo(%Struct{foo: "zip"}), do: :zippy
def foo(%Struct{foo: "zelo"}), do: :zarika
(and then 1000 more)

Versus

Reduced AST case variant
There is less code as some constructs (def, struct) are not repeated a 1000 times.

def foo(%Struct{foo: my_var}) do
  case my_var do
   "bar" -> :bartender
   "zip" -> :zippy
   "zelo" -> :zarika
    (and then 1000 more)
  end
end

Reduced AST function heads
This also avoids repetition and has reduced AST.

def foo(%Struct{foo: my_var}), do: do_foo(my_var)

defp do_foo("bar"), do: :bartender
defp do_foo("zip"), do: :zippy
defp do_foo("zelo"), do: :zarika
 (and then 1000 more)

This also goes for all other things where you can eliminate AST by simply writing/generating code differently.

5 Likes

I already posted a benchmark showcasing guards are roughly 2.5x slower then pattern matching Compiling files with many function heads is very slow(OTP 26 issue?) - #44 by Schultzer and here is the code Compiling files with many function heads is very slow(OTP 26 issue?) - #51 by Schultzer

I did not see code with no guards (you mentioned you did change, but no reproducible examples afaik) and the version you posted on the Erlang Forums simply added a body to the clause. No guards involved. And that case the difference in performance was related to the amount of generated code. We can continue the discussion here: Why does the efficiency guide not mention how expensive exported functions can be for compilation times - #22 by Schultzer - Questions / Help - Erlang Programming Language Forum - Erlang Forums

In any case, @BartOtten nailed the root cause in his last comment above.

The code I posted is benchmarking a function using case statement against a function using a guard. And pattern matching outperform guards.

The issue in the ErlangForum is about how to generate this case statement efficiently for the compiler when the return is a function call.

If that is because the guard function is using utf8 is beside the point as the way the function is written makes it slower by 2.5x

defmodule Test do
  require Unicode.Set
  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), _::binary>> -> true
        end
      )
    end ++ quote do
      _ -> false
    end

  def case_fn(binary) do
    case binary do
      unquote(case_ast)
   end end

  def guard_fn(<<b::utf8, _::binary>>) when Unicode.Set.match?(b, "[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]"), do: true
  def guard_fn(_binary), do: false
end
➜  sql git:(main) ✗ mix sql.bench
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.20.0-dev
Erlang 28.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 2 s
parallel: 1
inputs: 1..100_000
Estimated total run time: 22 s

Measured function call overhead as: 0 ns
Benchmarking case with input 1..100_000 ...
Benchmarking guard with input 1..100_000 ...
Calculating statistics...
Formatting results...

##### With input 1..100_000 #####
Name           ips        average  deviation         median         99th %
case        15.07 M       66.35 ns ±42238.21%          42 ns          42 ns
guard        6.14 M      162.76 ns ±15740.33%         125 ns         167 ns

Comparison:
case        15.07 M
guard        6.14 M - 2.45x slower +96.41 ns

Memory usage statistics:

Name    Memory usage
case             40 B
guard            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name Reduction count
case                2
guard               2 - 1.00x reduction count +0


So this thread was missing the “single guarded delegating function” versus “single guarded function with unguarded case clauses” benchmark results. So for completeness:

OPTIMIZED MULTI HEAD: 3519.502 ms

defmodule Heads.A do
  def foo(x,y) when is_integer(x) and is_integer(y), do: do_foo(x,y)

  for i <- 1..10000 do
    def do_foo(x = unquote(i), _y = unquote(i)), do: unquote(i)
  end
end

Versus

OPTIMIZED CASE: 3527.108 ms

defmodule Case.A do
  case_ast =
    for i <- 1..10000 do
      hd(
        quote do
          {x = unquote(i), _y = unquote(i)}  -> unquote(i)
        end
      )
    end

  def foo(x, y) when is_integer(x) and is_integer(y) do
    case {x, y} do
      unquote(case_ast)
    end
  end
end

The compilation time difference is insignificant and switching in favor between the two methods. So you might think both are equal. Well, no…

Using HEAD you get 10000 reports that “variable x is unused” and should probably be underscored, while the unused variable is not detected when using CASE. If you care about clean code, the OPTIMIZED MULTI HEAD is the better option.

[1] disclaimer: in this case, don’t fall for premature optimization, always bench your own use case if you really want to know, unless José tells you otherwise


  1. Footnotes ↩︎

2 Likes