I think the behavior of pattern matching on null-valued pairs within structs is surprising:
iex(6)> defmodule A do defstruct [:a, :b] end
iex(7)> match?(%A{a: a}, %A{})
true
iex(8)> match?(%{a: a}, %{})
false
I would argue this is not very useful. By matching on the struct itself (e.g. on %A{}) we already know the keys that are going to be present. Hence pattern-matching on members provides zero additional value.
IMO, it would be more useful if the above struct match returned false. Indeed, it would provide a way to test a semantical presence of a value rather than a mechanical one. This can be done via guards but the syntax is more verbose.
It provides tons of value if you include a pattern on those members, or if you include those members in a guard.
Letās filter all As that are empty
as |> Enum.filter(&match?(%A{a: []}, &1))
Ultimately the whole point of match? is that it works exactly like:
case value do
%A{a: a} -> true
_ -> false
end
If you want some other characteristic then you want some other function.
We are more than happy to try to help provide the most idiomatic way to achieve whatever your goal is in this situation. Just understand that suggestions that involve breaking changes to core functions are unlikely to gain a lot of traction.
Ben, thanks a lot for your detailed answer. I was probably not clear enough about it, but I am not interested in match? per se but rather in the design decision behind pattern matching implementation for structs. Given that I understand how structs are implemented I see that pattern matching works in a most straightforward way.
As for structs, they act very much like ārecordsā from Erlang.
And to do something like having match?(%A{a: a}, %A{}) return false then it would have to have a guard, and guards introduce a non-trivial hit in processing time (doing <<_:binary>> is faster than doing s when is_binary(s) for one example), which is dreadfully important at times.
Not necessarily. One could forbid keeping nil values in structs but keep the set of possible keys somewhere else (which, in turn, might be useful for defining the order of keys for instance).
Interesting, I would not have guessed. Did you learn it by trying to optimize some code or by learning BEAM internals?
A struct is just a map, so worst-case someone could just Map.put(someStruct, :theKey, nil)ā¦
I came from the erlang world with a lot of network parsing, Iāve been working with the BEAM off and on for ~15 years I think now (early 2000ās). ^.^
This is not true. At least not on todayās BEAM. Matching on <<_:binary>> is going to be faster, if youāre doing a binary pattern match already, to keep the āsingle match contextā optimization. But if youāre only checking if something is a binary, itās going to be slower. This is primarily because it needs to allocate the match context (which creates garbage on the heap that will lead to sooner garbage collection), while checking is_binary does not involve any memory movements. This memory access (and GC jitter) can be seen in the increased deviation of the benchmark.
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.6.0-dev
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
parallel: 1
inputs: binary, not binary
Estimated total run time: 28 s
##### With input binary #####
Name ips average deviation median 99th %
guard 436.27 K 2.29 Ī¼s Ā±34.28% 2.10 Ī¼s 4.20 Ī¼s
match 236.41 K 4.23 Ī¼s Ā±1391.73% 4 Ī¼s 11 Ī¼s
Comparison:
guard 436.27 K
match 236.41 K - 1.85x slower
##### With input not binary #####
Name ips average deviation median 99th %
guard 448.69 K 2.23 Ī¼s Ā±35.38% 2 Ī¼s 4.40 Ī¼s
match 431.08 K 2.32 Ī¼s Ā±43.81% 2.20 Ī¼s 4.60 Ī¼s
Comparison:
guard 448.69 K
match 431.08 K - 1.04x slower
The benchmark code:
defmodule Bench do
def match(<<_::binary>>), do: :binary
def match(_), do: :not_binary
def guard(binary) when is_binary(binary), do: :binary
def guard(_), do: :not_binary
def loop(0, _fun) do
:ok
end
def loop(n, fun) do
fun.()
loop(n - 1, fun)
end
end
jobs = %{
"match" => fn input -> Bench.loop(100, fn -> Bench.match(input) end) end,
"guard" => fn input -> Bench.loop(100, fn -> Bench.guard(input) end) end
}
inputs = %{
"binary" => "foo",
"not binary" => 123
}
Benchee.run(jobs, inputs: inputs)
Ooo really? Interestingā¦ Back in the day just by swapping from guards to matchās of precisely that style on erlang more than doubled the speed of things I made that were head-heavy. At least back then I remember Guards being run āafterā the match tree had been run, so if you had any kind of sizeable head-count for functions/cases then lifting more into the match context would be quite valuable. Fascinating though, thanks for the benchmark, though I wonder if that holds true with large head counts still?
Ahh, I think I was thinking about lists (as lists are the default āstringā type in erlang):
##### With input binary #####
Name ips average deviation median 99th %
guard_top 34.09 K 29.34 Ī¼s Ā±74.54% 28 Ī¼s 45 Ī¼s
guard_bot 30.61 K 32.67 Ī¼s Ā±74.42% 31 Ī¼s 48 Ī¼s
match_bot 22.81 K 43.85 Ī¼s Ā±35.95% 41 Ī¼s 78 Ī¼s
match_top 22.38 K 44.69 Ī¼s Ā±38.02% 43 Ī¼s 67 Ī¼s
Comparison:
guard_top 34.09 K
guard_bot 30.61 K - 1.11x slower
match_bot 22.81 K - 1.49x slower
match_top 22.38 K - 1.52x slower
##### With input list #####
Name ips average deviation median 99th %
match_bot 34.39 K 29.08 Ī¼s Ā±65.78% 28 Ī¼s 41 Ī¼s
guard_top 34.24 K 29.21 Ī¼s Ā±54.19% 28 Ī¼s 45 Ī¼s
match_top 31.49 K 31.75 Ī¼s Ā±61.61% 31 Ī¼s 46 Ī¼s
guard_bot 31.25 K 32.00 Ī¼s Ā±57.29% 31 Ī¼s 49 Ī¼s
Comparison:
match_bot 34.39 K
guard_top 34.24 K - 1.00x slower
match_top 31.49 K - 1.09x slower
guard_bot 31.25 K - 1.10x slower
##### With input not binary #####
Name ips average deviation median 99th %
guard_bot 31.18 K 32.07 Ī¼s Ā±63.11% 31 Ī¼s 44 Ī¼s
guard_top 31.15 K 32.10 Ī¼s Ā±58.90% 30 Ī¼s 50 Ī¼s
match_bot 29.16 K 34.30 Ī¼s Ā±38.15% 33 Ī¼s 49 Ī¼s
match_top 27.41 K 36.48 Ī¼s Ā±36.63% 35 Ī¼s 52 Ī¼s
Comparison:
guard_bot 31.18 K
guard_top 31.15 K - 1.00x slower
match_bot 29.16 K - 1.07x slower
match_top 27.41 K - 1.14x slower
Substantial performance difference on binaries though (a good 50% slower for match contexts)! And of course the list test is faster than the binary test regardless of how it is done (the BEAM is built for lists instead of binaries when processing them out). ^.^
EDIT: The code is about the same is @michalmuskalaās except the count is increased to 1000 to prevent fast function warnings and I added some tuple cases to each set (unmatched in all cases) with the string/list test being at the top or bottom of the tuple matchers as per the test name.