Help in optimizing Elixir benchmark for completely-unscientific-benchmarks

I am adding Elixir code to the completely-unscientific-benchmarks repo. See #79

The benchmark implements a Treap data structure. My implementation produces correct results as far as I can tell, however, it is extremely slow. I’ve done a little bit of profiling on it but nothing seems to stick out.

Any ideas? Code in question: https://github.com/dbishai/completely-unscientific-benchmarks/commit/f3f0ed4ea67f1d6834895cb9edf578ea59c70d1a

From a first skim, the use of cond is not quite idiomatic, you should try to rewrite those functions to use pattern matching.

It benefits from optimisations on the BEAM code level.

Though this might not necessarily be what makes your code slow. I have not yet run it to get more insight.

Also, that you run code in the source that produces output during compile time feels strange. I’m not sure if this might influence optimisations.

  1. Use one module and private methods, otherwise compilator cannot automatically inline them and with remote calls it needs to always check if there is no newer module in memory.
  2. Use pattern matching, not cond. It is easier for compiler to optimise.
  3. Try to make code TCO optimisable (ex. split_binary/2).
  4. IIRC records are in general faster than structures.

OK, I made some of those modifications: https://github.com/dbishai/completely-unscientific-benchmarks/commit/de621de621b0e694c92ad105efe91821b627f648

It doesn’t seem to have made much of a difference. It’s hard to believe it could be running this slowly without there being a major problem somewhere…however this was written by referencing the other codebases and the outputs are correct so I’m not sure.

Where do they display results?

On the README

defmodule TreapNode do
  defstruct x: nil, y: :rand.uniform(), left: nil, right: nil
end

The :rand.uniform() is evaluated at compile time, not runtime, so all your nodes have the same y which yields a degenerate data structure. Just assign y when creating the node instead.

Cond and pattern matches basically compile to the same code. You could gain some improvement from using local functions when possible, it’s a bit cheaper and allows the compiler to track types across functions (right now the code spends a lot of time making sure the inputs to merge/2 are maps, that they have the right keys, etc.) You would probably gain from using tuples (or records) rather than structs too.

(reposted because I replied to the wrong post earlier!)

1 Like

That’s not true. A top level case and function heads do, but cond cannot be converted to pattern matches, as it allows for any expressions to be used, while pattern matches can only harness guard save functions.

The BEAM doesn’t have specific instructions for pattern matching, guard functions, or cond. For instance a pattern match:

  def which1(nil, _), do: :left
  def which1(_, nil), do: :right

compiles to:

{function, which1, 2, 8}.
  {label,7}.
    {line,[{location,"matchit.ex",3}]}.
    {func_info,{atom,'Elixir.Matchit'},{atom,which1},2}.
  {label,8}.
    {test,is_eq_exact,{f,9},[{x,0},{atom,nil}]}.
    {move,{atom,left},{x,0}}.
    return.
  {label,9}.
    {test,is_eq_exact,{f,7},[{x,1},{atom,nil}]}.
    {move,{atom,right},{x,0}}.
    return.

which is the same code a cond generates, minus the potential error (case clause vs function clause). There are differences in language semantics, but for equivalent code one isn’t faster than the other. I’m mostly replying to the comment suggesting there are BEAM-level optimizations for pattern matching.

I mentioned your example. Yes function heads are compiled to a the equivalent of a single function directly using the params with a case statement. cond on the other hand branches on arbitrary expressions evaluating to true(thy) and not by patterns.

1 Like

Yep, that is exactly what it was. Thank you so much! Also big thanks to everyone else that contributed suggestions :slight_smile:

Here’s a cond using a mix of guard and non-guard functions:

    cond do
      f1(x) and f2(y) ->
        1

      is_map(x) ->
        2

      true ->
        3
    end

You can rewrite it to use case and pattern matching:

    case f1(x) and f2(y) do
      val when val != false ->
        1

      _ ->
        case x do
          %{} ->
            2

          _ ->
            3
        end
    end

The two snippets generate the exact same bytecode. A pattern match on %{} just becomes a call to is_map.

In this case it is possible, however in case of:

cond do
  x.a and x.b ->
    1
  is_map(x) ->
    2

  true ->
    3
end

It is not possible to optimise that to pattern match as x can be an atom, and then x.a and x.b will be remote calls which aren’t allowed in the guards.