Registry.select/2 -- why does the "body" have extra curly braces?

fireproofsocks · July 13, 2023, 7:58pm

I’ve been digging more into the Registry module and I’m working on the Registry.select/2 function in particular. To re-state the docs, the 2nd argument is a list in 3 parts, in the shape of [{match_pattern, guards, body}]

The body is something like a template, so it makes sense that it is bound tightly with the match_pattern. So the only variables you have available are the ones that were assigned via a match. This is pretty similar to some parts of regular expressions.

For example, following the “SELECT ALL” example from the docs:

Registry.select(Registry.SelectAllTest, [{
  {:"$1", :"$2", :"$3"}, 
  [], 
  [{{:"$1", :"$2", :"$3"}}]
}])

the last bit [{{:"$1", :"$2", :"$3"}}] is the body, and the result is that it formats each result as a tuple, e.g.

iex> {:ok, _} = Registry.start_link(keys: :unique, name: Registry.SelectAllTest)
{:ok, #PID<0.127.0>}
iex> {:ok, _pid} = Agent.start_link(fn -> 0 end, name: {:via, Registry, {Registry.SelectAllTest, "a"}})
{:ok, #PID<0.130.0>}
iex> {:ok, _pid} = Agent.start_link(fn -> 0 end, name: {:via, Registry, {Registry.SelectAllTest, "b"}})
{:ok, #PID<0.132.0>}
iex> {:ok, _pid} = Agent.start_link(fn -> 0 end, name: {:via, Registry, {Registry.SelectAllTest, "c"}})
{:ok, #PID<0.134.0>}
iex> Registry.select(Registry.SelectAllTest, [{{:"$1", :"$2", :"$3"}, [], [{{:"$1", :"$2", :"$3"}}]}])
[
  {"b", #PID<0.132.0>, nil},
  {"c", #PID<0.134.0>, nil},
  {"a", #PID<0.130.0>, nil}
]

However, one thing confuses me: why do the tuples have an extra pair of curly braces? I would expect the body to look like this instead: [{:"$1", :"$2", :"$3"}] – just a tuple. But this results in an ArgumentError:

iex> Registry.select(Registry.SelectAllTest, [{{:"$1", :"$2", :"$3"}, [], [{:"$1", :"$2", :"$3"}]}])
** (ArgumentError) errors were found at the given arguments:

  * 2nd argument: not a valid match specification

The extra set of braces are not required if we want to format the results as a map, for example:

iex> Registry.select(Registry.SelectAllTest, [{{:"$1", :"$2", :"$3"}, [], [%{key: :"$1", pid: :"$2", val: :"$3"}]}])
[
  %{key: "b", pid: #PID<0.132.0>, val: nil},
  %{key: "c", pid: #PID<0.134.0>, val: nil},
  %{key: "a", pid: #PID<0.130.0>, val: nil}
]

Can someone explain that? Seems like a weird gotcha. Thanks!

LostKobrakai · July 13, 2023, 8:09pm

I’m not sure I can explain it, but there’s rather extensive documentation around match specs:
https://www.erlang.org/doc/apps/erts/match_spec.html

D4no0 · July 13, 2023, 8:13pm

Goddamn you need to be a programming language designer to understand those specs .

christhekeele · July 14, 2023, 3:58am

The TL;DR is that in this context, tuples need to be “escaped” by being double-wrapped to mean “literal tuples” to avoid conflicts with “function call” syntax.

The full explanation dives into the nuances of the “match specification” grammar that this Registry feature relies on, that I’ve studied a lot.

Match Specification Grammar

The Registry.select/2 function accepts an erlang match specification, which is effectively a special-purpose AST for representing erlang code that can be executed very efficiently against things like ETS tables and tracing messages. This erlang-ish AST is composed of just tuples, lists, and atoms.

In your match specifications, you can invoke the functions allowed in erlang guards, in both your “guards” and “bodies” of the spec. The “function calls in guards and bodies” part of the grammar is called a “MatchCondition” if used as a guard, or a “ConditionExpression” if nested in a guard or used in a body, where a literal tuple is used to represent a call to a function, with optional arguments:

MatchCondition ::= { GuardFunction } | { GuardFunction, ConditionExpression, … }

ConditionExpression ::= ExprMatchVariable | { GuardFunction } | { GuardFunction, ConditionExpression, … } | TermConstruct

This means you can make a call to Kernel.is_integer/1 in your match specifications by writing {:is_integer, 2.1}.

There is a similar syntax to make “fake” function calls in bodies, that drive special behaviour when using a match specification for tracing, that also uses tuples-as-calls:

ActionCall ::= {ActionFunction} | {ActionFunction, ActionTerm, …}

This syntax conflicts with tuple literals, so literal tuples must be “escaped” where function calls are allowed (in “guards” and “bodies”) by double-wrapping them:

TermConstruct = {{}} | {{ ConditionExpression, … }} | | [ConditionExpression, …] | #{} | #{term() => ConditionExpression, …} | NonCompositeTerm | Constant

For example, if for some reason in your match spec body you wanted to call :erlang.element(2, literal_tuple) (which is just Elixir’s Kernel.elem(literal_tuple, 1)), you’d have to wrap the literal tuple but not the function call:

{:element, 2, {{:foo, :bar}}}

Notes

You’ll notice that lists and map literals are not semantically meaningful in this part of the AST, so do not need to be escaped:

TermConstruct = {{}} | {{ ConditionExpression, … }}| [] | [ConditionExpression, …] | #{} | #{term() => ConditionExpression, …} | NonCompositeTerm | Constant

Elixir AST Grammar Analog

Elixir’s AST represents local calls similarly, with literal three-tuples and argument lists:

quote(do: foo(:bar, :baz))
#=> {:foo, [], [:bar, :baz]}

Tuple literals are actually represented as local calls, by getting parsed as a “call” to a tuple literal constructor :{} that is handled later by expanding its arguments and turning it into a tuple literal in the Erlang Abstract Format:

quote(do: {:foo, :bar, :baz})
#=> {:{}, [], [:foo, :bar, :baz]}

So in a way, tuple literals in Elixir would also overlap with the Elixir AST’s three-tuple “local call” syntax, but the tokenizer “escapes” them for us before we get to the AST by wrapping them as a local call, to deal with them specially later, during AST expansion. Erlang matchspecs don’t really get the same tokenizing pass to save us from the ambiguity, though, since they are already in-memory AST data structures.

Matcha

I spent a lot of time with this grammar document building Matcha, an elixir-to-ms compiler.

Matcha lets you write your match specs as pure Elixir; double-wraps tuples for you in the correct contexts; and does other busy-work like handling macro expansion, turning Elixir guards into erlang guards, emitting correct Elixir compiler warnings for unused variables in your matchspec code, and all those nice things to make writing match specifications feel like writing first-class Elixir.

This could be done with Matcha via:

require Matcha
spec = Matcha.spec do
  {key, pid, value} -> {key, pid, value}
end
#=> #Matcha.Spec<[{{:"$1", :"$2", :"$3"}, [], [{{:"$1", :"$2", :"$3"}}]}]>
Registry.select(Registry.SelectAllTest, spec.source)

Because Registry :ets table entries are always three-tuples ({key, pid, value}), this is equivalent to a “SELECT ALL” operation.

An even more intuitive approach to matching these entries would be to just to match on the expected three-tuple structure, but return the entire match. Matchspecs have a special shorthand :"$_" syntax for this, which Matcha is aware of:

spec = Matcha.spec do
  entry = {key, pid, value} -> entry
end
# warning: variable "key" is unused (if the variable is not meant to be used, prefix it with an underscore)
# warning: variable "pid" is unused (if the variable is not meant to be used, prefix it with an underscore)
# warning: variable "value" is unused (if the variable is not meant to be used, prefix it with an underscore)
#=> #Matcha.Spec<[{{:"$1", :"$2", :"$3"}, [], [:"$_"]}]>

These are nice Elixir warnings about our code we could resolve with underscored variable names, like:

spec = Matcha.spec do
  entry = {_key, _pid, _value} -> entry
end
#=> #Matcha.Spec<[{{:"$1", :"$2", :"$3"}, [], [:"$_"]}]>

We could improve this even further by just using Kernel.tuple_size/1:

spec = Matcha.spec do
  entry when tuple_size(entry) == 3 -> entry
end
#=> #Matcha.Spec<[{:"$1", [{:==, {:tuple_size, :"$1"}, 3}], [:"$1"]}]>

However, support for tuple_size/1 and a few other guards in match specs is only available in OTP 26 and up; I requested OTP support for them in the match spec engine, for Matcha’s benefit, and it only landed just recently.

I talk a bit about why matchspecs are the way they are, why :ets exists, why only some functions are allowed in guards, and how to tackle reading that giant blob of grammar, in this presentation. I explicitly call out Registry as the only place they are used in Elixir APIs today

fireproofsocks · July 17, 2023, 1:14pm

I saw that presentation at the conference – it was illuminating! Thank you for the link!