Calling all Matchspecs!

christhekeele · December 2, 2021, 3:14pm

Hey all!

I’ve been working on an elixir-to-matchspec compiler. Think ex2ms with support for a few more expressions, pattern support, tracing support, helpful errors, and some other niceties.

Right now it’s passing ex2ms's test suite, and I’ve added many tests of my own, but… I know I’ve seen edge cases where bad matchspecs were generated during development that I forgot to jot down. Also, I’d love to see it build other matchspecs used in the field that people have crafted by hand!

So, it’d be incredibly helpful if anyone cared to share theirs to bolster my test suite! I’m interested in:

matchspecs passed to
- :ets.select...
- :recon_trace.calls/2
- Registry.select/2
- :dbg.tp...
- :erlang.trace_pattern...
matchpatterns passed to
- :ets.match...
- Registry.match/4
especially anything with
- non-trivial destructuring in the match heads
- nested tuple literals in match bodies

All contributions greatly appreciated!

mguilmineau · December 7, 2021, 2:19am

A few samples from our code - hope it helps.

:ets.select( ets_name( customer ), [ { { { customer, :_ }, { true, :"$1", :_, :"$2" } },
[ { :andalso,
{ :"==", {:map_get, :a, {:map_get, :job, :"$2"}}, a},
{ :"==", {:map_get, :b, {:map_get, :job, :"$2"}}, b}
} ],
[ [:"$1", :"$2"] ] } ] )

:ets.select_delete( ets_name( customer ), ( for task_id ← task_ids,
do: { { { customer, :_ }, :"$1" }, [ { :"==", {:map_get, :id, :"$1"}, task_id } ], [:true] } ) )

:ets.select( ets_name( customer ),
[ { { { customer, :_ }, { :, :"$1", :, :"$2" } },
[ { :andalso,
{ :">=", :"$1", Util.to_dets( date ) },
{ :"<", :"$1", Util.to_dets( Util.tomorrow ) }
} ],
[ :"$2" ] } ] )

:ets.select( ets_name( customer ),
[ { { { customer, :_ }, :"$1", :"$2", :, : },
[ { :or,
{ :"==", {:map_get, :reattempt, :"$1"}, @auto },
{ :"==", {:map_get, :reattempt, :"$1"}, @manual }
} ],
[ :"$1" ] },
{ { { customer, :_ }, :"$1", @paused, :, : },
[ ],
[ :"$1" ] },
{ { { customer, :_ }, :"$1", @deleted, :, : },
[ ],
[ :"$1" ] },
( if for_seeding,
do: { false, [], [ true ] },
else: { { { customer, :_ }, :"$1", :, :, true },
[ ],
[ :"$1" ] } )
] )

christhekeele · December 7, 2021, 6:06am

This is invaluable @mguilmineau! Thank you so much!

It did in fact surface the bug I couldn’t recall the cause of! (In your first example, I have to re-write the map_gets to map destructuring. Turns out things broke when you both destructured a map to bind a variable from one of its values, 2 maps deep, and assigned the map itself to a variable at the same time.)

One fun thing is that since all specs pass through Elixir’s compiler, at compile time, before being converted to the underlying syntax, it issues a friendly familiar warning on the first clause of your 4th example: if you convert the :"$2" reference into, say, a variable called arg2 in the match head, Elixir complains:

warning: variable "arg2" is unused (if the variable is not meant to be used, prefix it with an underscore)

This is, of course, because it isn’t used in the body of the matchspec. Changing it to _ or _arg2 fixes.

So, two of your examples are working well when transliterated, and generate (semantically) identical matchspecs:

Example 1

{customer, job_a, job_b} = {:customer, :job_a, :job_b}

spec :table do
  {{^customer, _}, {true, var1, _, var2 = %{a: %{job: match_a}, b: %{job: match_b}}}}
  when match_a == job_a and match_b == job_b ->
    [var1, var2]
  end

Generated spec

[
  {{{:customer, :_}, {true, :"$1", :_, :"$2"}},
    [
      {:andalso, {:==, {:map_get, :a, {:map_get, :job, :"$2"}}, :job_a},
      {:==, {:map_get, :b, {:map_get, :job, :"$2"}}, :job_b}}
    ], [[:"$1", :"$2"]]}
]

Example 3

{customer, dets_date, dets_tomorrow} = {:customer, :dets_date, :dets_tomorrow}

spec :table do
  {{customer, _}, {_, var1, _, var2}} when var1 >= dets_date and var1 < dets_tomorrow ->
    var2
end

Generated spec

[
  {{{:customer, :_}, {:_, :"$1", :_, :"$2"}},
    [
      {:andalso, {:>=, :"$1", {:const, :dets_date}}, {:<, :"$1", {:const, :dets_tomorrow}}}
    ], [:"$2"]}
]

Examples 2 and 4, however, are using for and if inside of the spec when building clauses. This is something I can totally support, but haven’t decided on the syntax yet. The simple spec do; x -> y; end syntax mimicking anonymous function definitions obviously don’t support dynamic clause generation.

My two ideas are to either expose a lower-level single-clause compiler and a merging syntax, ex:

multipliers = [1, 2]

Spec.from_clauses(
  for multiplier <- multipliers do
    Spec.clause({x, y} when x * multiplier == y) do
      x + y
    end
  end
)

Or, to try to allow top-level, or even nested, control structures within the spec macro:

multipliers = [1, 2]

spec do
  for multiplier <- multipliers do
    {x, y} when x * multiplier == y) -> x + y
  end
end

The former feels verbose, but very straight-forward. The latter seems convenient, but also violates basic syntax rules of Elixir. Allowing arbitrary ->s in existing control structures that don’t normally support them like for and if would be confusing. I’m also not sure how it would need to interact with control structures that already use ->, like case and cond.

What do you think?

mguilmineau · December 7, 2021, 6:47am

Nice. I did clean up and simplify my examples, hence the rogue :"$2". Good that you caught this!
If you were not supporting the if within the query spec, we could easily branch that :ets.select into two functions each with their own query spec.
On the other hand, the for is valuable to support as it noticeably speeds up queries.
I do not have a strong preference on your question about syntax. Hopefully actual contributors to the language will chime in
One thought that may lead you to consider one version vs another, the for here is used as an OR clause to match multiple task ids, however it could also be used to match multiple possible attributes (as in [ { :"==", {:map_get, field, :"$1"}, … i.e. we’d be matching a key instead of a value. I imagine part of the appeal in writing the matchspecs in elixir is having key and value differentiation standing out more clearly.
A final thought: none of these pattern matches are terribly efficient since they require a full scan of the data sets, as opposed to lookup and match. We make up for it by optimizing our :ets cache set structures from flatter :dets stores (denormalized in multiple ways and updated only as needed). From my perspective, this is where the magic and difficulty lies, as opposed to writing the matchspecs. If the intermediate :ets caching and retrieval was done automatically based on our :dets storage structure and based on the way we ultimately query the data, that would be a lot of time and code saved indeed.

christhekeele · December 14, 2021, 11:16am

I feel as if this is 50/50, personally.

Match specs are already an informal, loosely-documented sort-of-erlang-AST. They are difficult to debug and error prone.
- The approach I am taking resolves this, as all specs are validated against erlang’s builtin ms test functions at compile time.
Traversing an even wider conceptual gap from sort-of-erlang-AST to Elixir just adds to the cognitive load in writing, debugging, and maintaining them.
- The approach I’m taking passes all code through the Elixir compiler first, to throw all the familiar errors and warnings, before converting Elixir code into specs.

This is my real goal: not to solve these kinds of problems, but to make match specs more accessible, and therefore increase their adoption in general, so that more advanced tooling can be easily built on top of them without needing to understand the underlying syntax. What I’m working on was originally a proposal to the language itself, though I feel like it belongs outside it now.

Of course, targeting Elixir AST as the high-level format should help with this a lot: library authors can just leverage Elixir’s powerful macro system to translate things (ex: mnesia schemas, ecto schemas, ets queries) from Elixir code, to Elixir code. Then my library can handle all the fussy details of whether or not it’s a viable match spec without requiring further knowledge.

Not quite sure I understand here—are you saying that you have a known set of ids on hand you want to retrieve verbatim, and doing a single :ets.select/2 call with a match spec is not as efficient as a series of :ets.lookup/2 calls or a single :ets.match/2 call with a match pattern?

mguilmineau · December 17, 2021, 3:16am

I support your effort and I hope I wasn’t giving a different impression. match specs are not intuitive, too different from elixir syntax and little discussed on the web. The pre-compilation validation is a welcome addition. The detailed match specs we currently have look unnecessarily intimidating. There is value in this effort.

My comment on full scan: we tend to duplicate values in :ets storage with keys designed to match the queries we run frequently, so as to use lookup or match instead of more complicated and slower select match specs. In other words, while we do use match specs they tend to be a temporary stop gap but eventually get simpler or are removed entirely, primarily for performance reasons.

christhekeele · November 21, 2023, 7:11pm

@mguilmineau Been a couple of years, but happy to report that recent work in the Matcha compiler allows full conversion of Elixir destructuring into ms guards, so the usage test suite based on your helpful examples is getting substantially more Elixir-ish.

mguilmineau · May 30, 2024, 5:13am

Just saw your note and clicked on the helpful examples link.

One thing that comes to mind is that :andalso (and :orelse) accepts any number of attributes sequentially, for example I have this:

		:ets.select_count( ets_name( customer ), [
				{ { :_, :_, :_, false, false, :"$6", :_, :_, :_, :"$10", :_, :_, :_, :_, :_ },
					[ { :andalso,
							{ :is_binary, :"$10" },
							{ :"=/=", :"$6", @deleted           },
							{ :"=/=", :"$6", @failed_limit      },
							{ :"=/=", :"$6", @paused            },
							{ :"=/=", :"$10", @invalid          },
							{ :"=/=", :"$10", @locked           },
							{ :"=/=", :"$10", @misconfigured    },
						} ],
					[true] }
			] ) > 0

A second thing I’ve learned since then is that >= is handled differently than < if you have nil values, so I must insert an :is_integer when comparing with >=

		ran_today = [
			{ :andalso,
				{ :is_integer, :"$11" }, # because next clause: nil > 1 == true
				{ :">=", :"$11", today_dets }, # true if :"$11" is nil
			} ]
		runs_today_or_overdue = [
			{ :"<", :"$12", tonight_dets }
		]

These weren’t obvious to me - not sure if either of this is useful for you & your library, but I’m sharing

Finally, found that the fastest way to generate an Elixir Map from an :ets output is to double up the brackets, as such:

		:ets.select( ets_name( customer ), [
			{ { { customer, :"$1" }, :"$2" },
				[ ],
				[ { { :"$1", {:map_get, :frequency, :"$2"} } } ] }
		] )
		|> Map.new

and I even have this optimized way of collecting counts:

	def counts_by_category( customer ), do:
		:ets.select( ets_name( customer ),
			[ { { { customer, :_ }, :"$1" },
				[ ],
				[ {:map_get, "category", {:map_get, :input, :"$1"}} ] }
			] )
		|> Enum.frequencies
		|> Enum.map( fn
			{:EXIT, v} -> {nil, v}
			{k, v}       -> {k,   v}
		end )
		|> Map.new

Cheers -
Mathieu

christhekeele · May 30, 2024, 5:28am

Excellent, I love me some edge cases!

I didn’t know that! I’ll have to play around with this form to see if it is a potential optimization.

I think this is a case of holistic term ordering, a standard caveat of comparison operators on the BEAM. Sadly there’s not much my compiler can do to “correct” this, but I hope that by passing all MS code thru the Elixir compiler first it will benefit from the new typing warnings coming to comparisons. I need to play around with how those intersect…

That’s slick! I’m pretty sure my compiler supports emitting map literals straight out of an MS but definitely should make sure that’s in the test cases.