🍵 Matcha - first-class match specifications for Elixir

DEVLOG.md 2023-10-04

I thought I’d share a concept I’m finally playing with for Matcha: first-class filters!

For the sake of these snippets, assume we have:

krillin = %{name: "Krillin", age: 28, power: 1_770}
goku = %{name: "Son Goku", age: 27, power: 3_000_000}
saiyans = [krillin, goku]

:tea: Matcha Filters

This is actually one thing that prompted me to begin investigating building Matcha (:stopwatch: :eyes: over four years ago?!), something in-between a first-class match pattern and a full match specification.

Matcha Specs

For context, a Matcha.Spec is similar to a deferred case statement you can pass around as a variable, and match against when you want instead of immediately. It has native support in :ets via the :ets.select_* APIs, and Matcha makes it easy to use them against arbitrary in-memory data as well (without the nice performance you get in :ets applications, a little slower than an equivalent Enum.map and more limited).

As I dig into in my matchspec talk, a match spec is essentially a data structure that looks like this:

match_spec = [
  {pattern, guards, body},
  {pattern, guards, body},
  # ...
]

This mirrors an equivalent case statement, but without an immediate match target:

case target do
  pattern when guards -> body
  pattern when guards -> body
  # ...
end

You can execute a deferred match specification against an in-memory target like so:

Matcha.Spec.call(match_spec, target)

Most of the cool stuff with match specs comes from the fact that you can hand this match specification to :ets.select_* (with match_spec.source) and it will test every object in a table against your specification, and for any successful match, return a transformed result, much more efficiently than loading the entire table into a process’s memory and doing this all yourself with Enum.filter + Enum.map or for comprehensions.

Matcha Patterns

Matcha also has support for a Matcha.Pattern construct. This acts like just a stand-alone pattern part of a match spec, and you can think of it as a deferred pattern match/destructuring. That is, if you have code like:

%{name: name, age: 27} = target

Then you will get a MatchError if target is not a map with the provided keys, and if target’s age does not match 27 exactly; otherwise it captures the name of only 27-year-olds in a variable called name. Matcha lets you build deferred matches like so:

match_pattern = Matcha.pattern(%{name: name, age: 27})

Matcha.Pattern.match?(match_pattern, krillin)
#=> false
Matcha.Pattern.match?(match_pattern, goku)
#=> true
Matcha.Pattern.matched_variables(match_pattern, goku)
#=> %{name: "Son Goku"}

Matcha.Pattern.matches(match_pattern, saiyans)
# => [%{name: "Son Goku", age: 27, power: 3000000}]
Matcha.Pattern.variable_matches(match_pattern, saiyans)
# => [%{name: "Son Goku"}]

There are, of course, better ways to do this in your Elixir programs with in-memory data. But :ets also lets you leverage match patterns against an entire table at once, returning only objects that match the pattern, via the :ets.match_* APIs (providing them the match_pattern.source). Matcha supports this :ets usecase with Matcha.Patterns.

Matcha Filters

Running with the above example, what if we wanted to only match people on inexact criteria? Say, people who had more than some exact quality?

Match patterns alone aren’t expressive enough to do this. Match specifications are, but they use a special syntax to do it, that we can’t really convert Elixir code into—one of the key goals of Matcha.

What we really want is to support a first-class deferred pattern when guards construct, and that’s exactly what “filters” are intended to be:

match_filter = Matcha.filter(%{name: name, power: power} when power > 9_000)

Matcha.Filter.match?(match_filter, krillin)
#=> false
Matcha.Filter.match?(match_filter, goku)
#=> true
Matcha.Filter.matched_variables(match_pattern, goku)
#=> %{name: "Son Goku", power: 3000000}

Matcha.Filter.matches(match_pattern, saiyans)
# => [%{name: "Son Goku", age: 27, power: 3000000}]
Matcha.Filter.variable_matches(match_pattern, saiyans)
# => [%{name: "Son Goku", power: 3000000}]

The overarching goal is to:

  • Provide an API that allows using all features of :ets match specs, including :"$$" and :"$_", from syntactically valid Elixir code
  • Provide a new mode of querying :ets tables tersely when re-mapping matched objects is not a requirement, akin to a hypothetical set of :ets.filter_* functions
  • Further my Macro Crimes :tm: in my two personal projects where I convert Elixir code into both Elixir functions, and compatible :ets queries, where having first-class :ets-compatible functions heads is a boon

Higher-level Query Support

The main reasons why I haven’t much popularized the Matcha.Pattern APIs and the higher-level Matcha.Table APIs are because:

  • The Matcha.Filter support for guards is sufficiently more powerful I may deprecate Matcha.Pattern or downplay it but still support it for :ets.match_* equivalents
  • The Matcha.Table APIs may get more powerful variants that know how to navigate either patterns or filters agnostically, and I don’t want to commit to them just yet

As an example, in tandem with my Matcha.Filter experiments, I have the following code snippet working mostly as expected:

over_nine_thousand = Matcha.Table.query( 
  {_id, %{name: name, power: power}} when power > 9_000
) 
alias Matcha.Table.Query

for saiyan <- Query.where(table, over_nine_thousand) do
  IO.puts("Scanning #{saiyan.name}, #{saiyan.age} years old...")
end
for %{name: threat} <- Query.select(table, over_nine_thousand), threat == "Son Goku" do
  IO.puts("#{threat}'s power level is OVER 9_000!")
end

All still a rough work in progress, but interested in early feedback!

- Much :tea:, Chris

4 Likes

This is really cool!

The first thing that comes first to mind is composition possibilities using techniques similar to Ecto’s query DSL.

Playing with the APIs a bit:

power_filter = Matcha.filter(%{power: power} when power > 9000)
age_filter = Matcha.filter(%{age: age} when age > 20)

and_query =
  Matcha.filter(
    {_id, ^power_filter and ^age_filter}
  )

or_query =
  Matcha.filter(
    {_id, ^power_filter or ^age_filter}
  )

goku_query =
  Matcha.filter(
    {_id, ^and_query and %{name: "Son Goku"}}
  )

Matcha.select(table, ^and_query) |> Enum.to_list()
#=> [{id, goku}]

Matcha.select(table, ^or_query) |> Enum.to_list()
#=> [{id, krillin}, {id, goku}]

Matcha.select(table, ^goku_query) |> Enum.to_list()
#=> [{id, goku}]

# or operate on an enumerable directly
Matcha.select(saiyans, ^power_filter) |> Enum.to_list()
#=> [goku]

Takeaways/questions from above:

  • Can the API be simplified, with more living in Matcha for discoverability and composability? Is Matcha.Table.Query needed at all?
  • Can a Matcha.select be intelligent enough to operate on enumerables as well as tables? (It first argument is an atom, assume table name?)

Far-out-there-idea: Could all of this be used to implement a far larger subset of the Ecto query DSL for ETS tables so that it can be used as an Ecto adapter with far fewer caveats than currently exist for Etso? (Perhaps @evadne may be able to shed more light on this if she has time.)

Edit to add: this strikes me as more doable than previous suggestions of merging match specs because you sidestep the introduction of new/named bindings. You’re only matching/filtering on the structure of data, and named bindings are only available within a single filter clause.

1 Like

I don’t know, but I would be happy to review any attempt at this. Etso was built to my own requirements and has been deployed in production 24/7/365 for several years as-is.

The main problem solved within Etso was to map Ecto’s Query, which is currently an undocumented structure, to ETS operations

2 Likes

@zachallaun thanks for your thoughts! Some notes:


This syntax is pretty intriguing, and doable with a bit of work to get Matcha macros to recognize Matcha structs. It’s the semantics that are difficult: your example works mainly by virtue of how uniquely map pattern matching is handled. In any other context it’d be trivial to combine non-sensical filters:

filter(power when power > 9000) and filter(age when age < 30)

Ultimately, pattern matching syntax is not readily composable. To do it justice you need to know more about the structure you are querying upfront, have a mechanism for navigating the composition of data bindings, and a query planner to coalesce predicates. That’s exactly what SQL is for, so I think Etso is probably always going to be a more-correct solution for this sort of thing. I intend to keep exploring though as filters come together!


Good questions, and related ones. It mostly depends on what this featureset ends up looking like.

Right now, there’s an intentional divide: Matcha.{Pattern,Filter,Spec} APIs let you apply them to in-memory enumerables, and Matcha.{Table,Mnesia} APIs mediate querying those respective data stores. They are separate today because match specs can be compiled differently for different contexts.

Today, Matcha effectively re-writes matchspecs intended for in-memory data so that they can be executed via :erlang.match_spec_test/:ets.match_spec_run but still behave more like the results of an :ets.select call, since the two modes of invocation have different semantics. Ironically this prevents the same MS from being used in either context. However, I can see a higher-level API JIT-converting an unadulterated MS into the right form for the target it is being called against.

I don’t love the current solution and have been meaning to look into a refactor for a while, this sort of functionality definitely motivates me to focus on reworking it to get a cleaner API!


I’m not sure there’s much overlap. Fundamentally Matcha converts Elixir AST into MS AST; and Etso converts Ecto’s SQL-esque AST into MS AST and executes it for you. SQL, and Ecto’s AST, is more expressive and composable than what pure Elixir syntax can describe in match heads.

I am starting to emit some metadata about the compiled MSs that might help with certain kinds of composition, currently by exposing variable binding names post-compilation, but that’s more likely to lead to a Matcha-powered :qlc-type thing with it’s own semantics rather than something that can sensibly insert itself into SQL semantics.

2 Likes

DEVLOG.md 2023-10-20

This update discusses new syntactic support for nested matches I want to experiment with for Matcha!


I started developing Matcha.filter/1 because I needed it for something I wanted to build for SpawnFest. Sadly, while developing the unannounced library I wanted to release first and use in the competition, I’ve ran into a limitation of matchspec’s expressivity I’ve long aspired to overcome. I’ve suspected for a while that it’s solvable, but solving it properly in Matcha will take too much time away from the development of the depending library I’d planned on using in the competition—it’s a hard blocker. I may submit something else, though!

My consolation prize is that my remorse has fueled me to think on the problem of nested matches more, and recent changes to the Matcha compiler to support filters should give me what I need to implement it. Let’s dive into nested matches in matchspecs!


The power of patterns

Here at :tea: :tm: Matcha Incorporated :copyright:, we’re big fans of the match operator, =, which performs a pattern match.

When you’re first learning a BEAM VM language, it’s easy to think of it as just the variable binding operator. After all, these are the semantics of = most of us are familiar with coming from other languages, and the pattern variable on the left side will always match the right side, and bind the entirety of the right side to variable:

variable = {:some, %{complicated: [:data, "structure"]}}
variable
#=> {:some, %{complicated: [:data, "structure"]}}

Over time, we learn to appreciate that pattern matching can also perform destructuring as well as variable binding:

{:some, data} = variable
data
#=> %{complicated: [:data, "structure"]}

It provides a natural syntax for multiple assignment, even from deeply nested values:

{:some, %{complicated: [type, specifics]}} = variable
{type, specifics}
#=> {:data, "structure"}

But what’s really wild is that you can nest matches inside matches, to both bind variables at a shallower level of nesting, and match on data at a deeper level of nesting:

{:some, data = %{complicated: [:data, specifics]}} = variable
{data, specifics}
#=> {%{complicated: [:data, "structure"]}, "structure"}

This is often used in combination with guard expressions, so we can extract a set of data when its specifics satisfy a guard:

case variable do
  {:some, data = %{complicated: [:data, specifics]}} when is_binary(specifics)
    -> data
end
#=> %{complicated: [:data, "structure"]}

Matches inside matches inside matchspecs

Obviously, if you can do this in Elixir, I want to support it in Matcha. Extracting general data from an object in an :ets table where its specifics satisfy a certain guard is a common use-case. However, if you’ve ever spent any time studying the matchspec grammar, first off: I’m sorry.

Secondly, you might have realized that there is no direct analog of nested matching in them! Specifically, the anatomy of a MatchHeadPart does not allow you to both describe binding a term to a variable, and performing a destructuring operation on that term (that may permit a binding with a deeper nested term).

Put another way, you cannot both bind to and destructure a term in a matchspec!

To clarify, for some:

Matcha.spec do
  pattern -> ...
end

This pattern is representable:

{:some, data} -> ...

And this pattern is representable:

{:some, %{complicated: [:data, specifics]}} -> ...

But both at once are not:

{:some, data = %{complicated: [:data, specifics]}} -> ...

Today, Matcha reflects this reality:

Matcha.spec do
  {:some, data = %{complicated: [:data, specifics]}} when is_binary(specifics) -> data
end
#!> ** (Matcha.Rewrite.Error) found problems rewriting code into a match spec: when binding variables
#!> 
#!>  ({:some, data = %{complicated: [:data, specifics]}} when is_binary(specifics) -> data)
#!>    error: cannot match `data` to `%{complicated: [:data, specifics]}`: cannot use the match operator in match spec heads, except to re-assign variables to each other
#!>     (matcha 0.1.10) lib/matcha/rewrite.ex:472: Matcha.Rewrite.raise_match_in_match_error!/3
#!>     (elixir 1.15.6) lib/macro.ex:667: Macro.do_traverse/4
#!>     (matcha 0.1.10) lib/matcha/rewrite.ex:445: Matcha.Rewrite.do_rewrite_bindings/2
#!>     (matcha 0.1.10) lib/matcha/rewrite.ex:323: Matcha.Rewrite.rewrite_clause/2
#!>     (elixir 1.15.6) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
#!>     (matcha 0.1.10) lib/matcha/rewrite.ex:210: Matcha.Rewrite.spec/2
#!>     (matcha 0.1.10) lib/matcha/rewrite.ex:199: Matcha.Rewrite.build_spec/3
#!>     (matcha 0.1.10) expanding macro: Matcha.spec/1

Look to the Guards

Is there a way forward from this? Nope, not really.

At least, not until recently.

  • Since before I started working on Matcha, matchspec guards supported most, but not all, BIFs that are allowed in guards.
  • OTP 25 introduced support for using the BIFs :erlang.is_map_key/2 and :erlang.map_get/2 in matchspecs.
  • At my behest, in OTP 26 @jhogberg graciously added support for all the other missing guard BIFs to matchspecs, mostly critically including :erlang.tuple_size/1, and added tests to ensure that all guard-safe BIFs are also allowed in matchspec guards going forwards!

Fake it 'til you match it

With the added support for the guard :erlang.tuple_size/1 in matchspecs, we finally have all the tools we need to support both binding and destructuring on the same nested term. All it would take is re-implementing destructuring of composite terms into guard checks in the Matcha compiler!

Take, for example, the matchspec:

Matcha.spec do
  {:some, data = %{complicated: [:data, specifics]}}
    when is_binary(specifics) 
      -> data
end

We can’t literally describe the nested match data = %{complicated: [:data, specifics]} in today’s match specification grammar.

But what we can do is… grisly, but semantically equivalent:

Matcha.spec do
  {:some, data}
    when :erlang.is_map(data) and :erlang.is_map_key(:complicated, data)
     and :erlang.is_list(:erlang.map_get(:complicated, data))
     and :erlang.length(:erlang.map_get(:complicated, data)) == 2
     and :erlang.hd(:erlang.map_get(:complicated, data)) == :data
     and is_binary(:erlang.hd(:erlang.tail(:erlang.map_get(:complicated, data)))) 
      -> data
end

Rather than ask you to type all that out, it should be possible to develop a destructuring-to-guards transpiler in the Matcha compiler and do it for you! We finally have enough guards available in matchspecs that I think we can convert any arbitrary destructuring of composite terms in an Elixir match pattern into a mess of matchspec-supported :erlang guards:

On paper, this is very cool. Off paper, this is very cool but requires a whole heck of a lot of work to the compiler. So I’ll be playing with this premise more over the next few months, when I find time!

6 Likes

A fairly stable version of this is now on the latest branch of Matcha! Tests are passing, only a few had to be changed. I will continue to dogfood it and the new Matcha.Filter feature in my own projects before releasing as a proper version.

As discussed, I’ve written a custom destructuring engine that can translate Elixir’s = pattern matching to a sequence of matchspec guards. This is non-trivial, so I wouldn’t be surprised if there are some bugs and poor error messages around edge cases.


Nesting

Remember that

This means that before, you could bind to and extract some high-level data from a nested data structure:

spec = Matcha.spec do
  {:some, data} -> data
end

And you could also do some deep pattern matching on nested datastructures:

spec = Matcha.spec do
  {:some, %{complicated: [:data, {:specifics, specifics}]}}
    when is_binary(specifics) 
      -> specifics
end

However, you could not do both at once with standard Elixir = syntax, since matchspecs don’t have real support for destructuring.

Destructuring

For example, perhaps you want to extract some high-level value by guarding on a nested one:

spec = Matcha.spec do
  {:some, data = %{complicated: [:data, {:specifics, specifics}]}}
    when is_binary(specifics) 
      -> data
end

This limitation has been hopefully thoroughly overcome! Where it used to raise an exception, the spec above works as expected: we can extract the higher-level subset of data when the specifics are binaries:

table = :ets.new(:table, [:bag])
:ets.insert(table, [
  {:some, %{complicated: [:data, {:specifics, :NOT_BINARY}]}},
  {:some, %{complicated: [:data, {:specifics, "BINARY"}]}},
])
Matcha.Table.ETS.Select.all(table, spec)
#=> [%{complicated: [:data, {:specifics, "BINARY"}]}]

Assignment

This works with anywhere we could normally bind variables, and “inner” bound variables are available to be used in your matchspec bodies, event though erlang matchspecs technically do not support them:

spec =
  Matcha.spec :table do
    {:some, map = %{complicated: list = [:data, tuple = {:specifics, specifics}]}}
    when is_binary(specifics) ->
      %{map: map, list: list, tuple: tuple, specifics: specifics}
  end

We’ll find this spec discovers the "BINARY" record we’re looking for, as well as letting us reference any nested destructuring assignment to bindings we performed.

Matcha.Table.ETS.Select.all(table, spec)
[
  %{
    map: %{complicated: [:data, {:specifics, "BINARY"}]},
    list: [:data, {:specifics, "BINARY"}],
    tuple: {:specifics, "BINARY"},
    specifics: "BINARY"
  }
]

And if we don’t use all of them—we still get our classic Elixir warnings about unused variables, as we should:

spec =
  Matcha.spec :table do
    {:some, map = %{complicated: list = [:data, tuple = {:specifics, specifics}]}}
    when is_binary(specifics) ->
      map
  end

#=> warning: variable "list" is unused (if the variable is not meant to be used, prefix it with an underscore)
#=> warning: variable "tuple" is unused (if the variable is not meant to be used, prefix it with an underscore)

Let me know if you get a chance to play around with these new capabilities, or discover any hiccups! This crosses a pretty major awkward bridge between Just Writing Elixir :tm: code, and actual desired matchspec behaviour; making even the most complicated matchspecs easy to build with Elixir’s pleasant syntax.

1 Like

The matcha ETS “adapter” can be also used to query mnesia tables? it also uses match specification (:

EDIT: I just take a more deeper look into the docs and found the ETS/DETS/Mnesia uses the same logic :sweat_smile:

1 Like

Yep!

LMK if you try it out or the docs need work, I haven’t really spent a lot of time with mnesia!

It does. And, if you just want to use Matcha as an elixir-to-ms compiler, you don’t need to use things like its Mnesia APIs—just call .source on any Matcha.Spec to get a raw ms you can pass straight to :mnesia and friends:

spec = Matcha.spec(:table) do
  { x, y } = z when x > 10 -> z
end

spec.source
#=> [{{:"$1", :"$2"}, [{:>, :"$1", 10}], [:"$_"]}]

:mnesia.select(table, spec.source, :read)

Yeah, i was just thinking to match (pun intended) matcha with the memento library! it will give an experience ala ecto (:

Zoey de Souza Pessanha
Desenvolvedora de Software
https://github.com/zoedsoupe

1 Like

Interesting thought!

It looks like memento uses a custom format for its select/3, so you can’t give it a matchspec directly. It seems to be doing something similar to but incompatible with a Matcha.filter().

However, it seems like match/3 might support a Matcha.pattern(...).source if it’s not also a DSL on top of them, worth trying!

And of course, select_raw/3 happily takes a full Matcha.spec(...).source, so you’re good to go there! The example they give:

match_head = {Movie, :"$1", :"$2", :"$3", :"$4"}
result = [:"$_"]
guards = [
  {:andalso,
    {:>, :"$3", 2010},
    {:orelse,
      {:==, :"$4", "Quentin Tarantino"},
      {:==, :"$4", "Steven Spielberg"},
    }
  }
]

Memento.Query.select_raw(Movie, [{match_head, guards, result}], coerce: true)
# => [%Movie{...}, ...]

should be able to be written something like

require Matcha

spec = Matcha.spec(:table) do
  {Movie, _id, _title, year, director} = record
    when year > 2010 and director in [ "Quentin Tarantino", "Steven Spielberg"]
      -> record 
end

Memento.Query.select_raw(Movie, spec.source, coerce: true)

The resulting spec.source is indeed identical to the example given.

1 Like

That’s a great fit! thanks for the upfront tests, I liked the way matcha can be used with memento! I surely will give it a try in my next project!

1 Like