Structs: pattern matching on nil values

I think the behavior of pattern matching on null-valued pairs within structs is surprising:

iex(6)> defmodule A do defstruct [:a, :b] end
iex(7)> match?(%A{a: a}, %A{})               
true
iex(8)> match?(%{a: a}, %{})  
false

I would argue this is not very useful. By matching on the struct itself (e.g. on %A{}) we already know the keys that are going to be present. Hence pattern-matching on members provides zero additional value.

IMO, it would be more useful if the above struct match returned false. Indeed, it would provide a way to test a semantical presence of a value rather than a mechanical one. This can be done via guards but the syntax is more verbose.

1 Like

Or just straight out by testing the nil case first: match?(%A{a: nil}, %A{})

Or !!%A{}.a ā€¦ looks a bit like swearing.

Wouldnā€™t work in a match context though? ^.^;

Oh, I thought the goal was to get a false.

Thanks for the suggestions. Indeed, !! wonā€™t work within a function signature unfortunately.

It provides tons of value if you include a pattern on those members, or if you include those members in a guard.

Letā€™s filter all As that are empty

as |> Enum.filter(&match?(%A{a: []}, &1))

Ultimately the whole point of match? is that it works exactly like:

case value do
  %A{a: a} -> true
  _ -> false
end

If you want some other characteristic then you want some other function.

We are more than happy to try to help provide the most idiomatic way to achieve whatever your goal is in this situation. Just understand that suggestions that involve breaking changes to core functions are unlikely to gain a lot of traction.

2 Likes

Ben, thanks a lot for your detailed answer. I was probably not clear enough about it, but I am not interested in match? per se but rather in the design decision behind pattern matching implementation for structs. Given that I understand how structs are implemented I see that pattern matching works in a most straightforward way.

That is entirely dictated by the BEAM VM itself.

As for structs, they act very much like ā€˜recordsā€™ from Erlang.

And to do something like having match?(%A{a: a}, %A{}) return false then it would have to have a guard, and guards introduce a non-trivial hit in processing time (doing <<_:binary>> is faster than doing s when is_binary(s) for one example), which is dreadfully important at times.

Not necessarily. One could forbid keeping nil values in structs but keep the set of possible keys somewhere else (which, in turn, might be useful for defining the order of keys for instance).

Interesting, I would not have guessed. Did you learn it by trying to optimize some code or by learning BEAM internals?

How would you do that?

A struct is just a map, so worst-case someone could just Map.put(someStruct, :theKey, nil)ā€¦

I came from the erlang world with a lot of network parsing, Iā€™ve been working with the BEAM off and on for ~15 years I think now (early 2000ā€™s). ^.^

1 Like

This is not true. At least not on todayā€™s BEAM. Matching on <<_:binary>> is going to be faster, if youā€™re doing a binary pattern match already, to keep the ā€œsingle match contextā€ optimization. But if youā€™re only checking if something is a binary, itā€™s going to be slower. This is primarily because it needs to allocate the match context (which creates garbage on the heap that will lead to sooner garbage collection), while checking is_binary does not involve any memory movements. This memory access (and GC jitter) can be seen in the increased deviation of the benchmark.

Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.6.0-dev
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
parallel: 1
inputs: binary, not binary
Estimated total run time: 28 s

##### With input binary #####
Name            ips        average  deviation         median         99th %
guard      436.27 K        2.29 Ī¼s    Ā±34.28%        2.10 Ī¼s        4.20 Ī¼s
match      236.41 K        4.23 Ī¼s  Ā±1391.73%           4 Ī¼s          11 Ī¼s

Comparison:
guard      436.27 K
match      236.41 K - 1.85x slower

##### With input not binary #####
Name            ips        average  deviation         median         99th %
guard      448.69 K        2.23 Ī¼s    Ā±35.38%           2 Ī¼s        4.40 Ī¼s
match      431.08 K        2.32 Ī¼s    Ā±43.81%        2.20 Ī¼s        4.60 Ī¼s

Comparison:
guard      448.69 K
match      431.08 K - 1.04x slower

The benchmark code:

defmodule Bench do
  def match(<<_::binary>>), do: :binary
  def match(_), do: :not_binary

  def guard(binary) when is_binary(binary), do: :binary
  def guard(_), do: :not_binary

  def loop(0, _fun) do
    :ok
  end
  def loop(n, fun) do
    fun.()
    loop(n - 1, fun)
  end
end

jobs = %{
  "match" => fn input -> Bench.loop(100, fn -> Bench.match(input) end) end,
  "guard" => fn input -> Bench.loop(100, fn -> Bench.guard(input) end) end
}

inputs = %{
  "binary" => "foo",
  "not binary" => 123
}

Benchee.run(jobs, inputs: inputs)
7 Likes

Ooo really? Interestingā€¦ Back in the day just by swapping from guards to matchā€™s of precisely that style on erlang more than doubled the speed of things I made that were head-heavy. At least back then I remember Guards being run ā€˜afterā€™ the match tree had been run, so if you had any kind of sizeable head-count for functions/cases then lifting more into the match context would be quite valuable. Fascinating though, thanks for the benchmark, though I wonder if that holds true with large head counts still?

Ahh, I think I was thinking about lists (as lists are the default ā€˜stringā€™ type in erlang):

##### With input binary #####
Name                ips        average  deviation         median         99th %
guard_top       34.09 K       29.34 Ī¼s    Ā±74.54%          28 Ī¼s          45 Ī¼s
guard_bot       30.61 K       32.67 Ī¼s    Ā±74.42%          31 Ī¼s          48 Ī¼s
match_bot       22.81 K       43.85 Ī¼s    Ā±35.95%          41 Ī¼s          78 Ī¼s
match_top       22.38 K       44.69 Ī¼s    Ā±38.02%          43 Ī¼s          67 Ī¼s

Comparison: 
guard_top       34.09 K
guard_bot       30.61 K - 1.11x slower
match_bot       22.81 K - 1.49x slower
match_top       22.38 K - 1.52x slower

##### With input list #####
Name                ips        average  deviation         median         99th %
match_bot       34.39 K       29.08 Ī¼s    Ā±65.78%          28 Ī¼s          41 Ī¼s
guard_top       34.24 K       29.21 Ī¼s    Ā±54.19%          28 Ī¼s          45 Ī¼s
match_top       31.49 K       31.75 Ī¼s    Ā±61.61%          31 Ī¼s          46 Ī¼s
guard_bot       31.25 K       32.00 Ī¼s    Ā±57.29%          31 Ī¼s          49 Ī¼s

Comparison: 
match_bot       34.39 K
guard_top       34.24 K - 1.00x slower
match_top       31.49 K - 1.09x slower
guard_bot       31.25 K - 1.10x slower

##### With input not binary #####
Name                ips        average  deviation         median         99th %
guard_bot       31.18 K       32.07 Ī¼s    Ā±63.11%          31 Ī¼s          44 Ī¼s
guard_top       31.15 K       32.10 Ī¼s    Ā±58.90%          30 Ī¼s          50 Ī¼s
match_bot       29.16 K       34.30 Ī¼s    Ā±38.15%          33 Ī¼s          49 Ī¼s
match_top       27.41 K       36.48 Ī¼s    Ā±36.63%          35 Ī¼s          52 Ī¼s

Comparison: 
guard_bot       31.18 K
guard_top       31.15 K - 1.00x slower
match_bot       29.16 K - 1.07x slower
match_top       27.41 K - 1.14x slower

Substantial performance difference on binaries though (a good 50% slower for match contexts)! And of course the list test is faster than the binary test regardless of how it is done (the BEAM is built for lists instead of binaries when processing them out). ^.^

EDIT: The code is about the same is @michalmuskalaā€™s except the count is increased to 1000 to prevent fast function warnings and I added some tuple cases to each set (unmatched in all cases) with the string/list test being at the top or bottom of the tuple matchers as per the test name.