Grammar - A library for writing parsers / transformers as (almost) regular functions

nmichel · November 12, 2024, 2:37pm

Hi !

Grammar is a very simple Elixir library that helps building parsers / transformers for LL(1) structured data, via two macros, and a protocol.

The main idea is to declare reduction rules in an Elixir-ish way, a bit like regular functions.

e.g.

defmodule MyStuff do
  use Grammar

  rule start("hello", :finish, :bang) do
    [_, finish, bang] = params
    "I see #{finish} #{bang || ""}"
  end

  rule finish("world") do
    "the world"
  end

  rule? bang("!"), do: "¡"
  rule? bang("?"), do: "¿"
end

iex> MyStuff.parse("hello world !")
{:ok, "I see the world ¡"}

It started like “How would I” and ended in something that quite works

I hope you may find it useful or just fun

Links

dimitarvp · November 12, 2024, 4:22pm

Nice. Would you be up for writing up a comparison with NimbleParsec?

nmichel · November 20, 2024, 9:34am

Hi !
Sorry I didn’t see your post !
Frankly speaking I didn’t know about NimbleParsec or any other tool of the same kind when I started thinking about the topic

From the quick look I throw at NimberParsec and xpeg, it seems to me that those tools follow a different path : they are based on a more formal description of the grammar with actions sowed at different times during parsing, where I choose a more “declarative” way, where actions occur only when a rule if fully and successfully parsed, and results collected.
I also followed another path, by delegating token extraction to a dedicated protocol, which allow for high level of customization. As an example, one can chose to extract a full bunch of data as a single token, because theis data block has a variable length.

I’m pretty sure also it allows for sub-byte tokens extraction, hence parsing of binary protocols.

My feeling is that those tools are more advanced, polished and efficient than mine, which makes sense regarding the time spent in each project.

Good to known that I’m currently rewriting the internals of my humble tool, for better space effienciency compared to the naive current approach.

nmichel · December 2, 2024, 3:35pm

Hi !

Grammar v0.3.0 is out !

This new version adds a new regular API for creating parser at runtime, alongside the DSL.

e.g.

Grammar.new()
|> Grammar.add_clause(:begin, [:hello, :world], fn [hello, world] -> "#{hello} #{world} !" end)
|> Grammar.add_clause(:hello, ["hello"], fn ["hello"] -> "bonjour" end)
|> Grammar.add_clause(:world, ["world"], fn ["world"] -> "monde" end)
|> Grammar.prepare!()
|> Grammar.start(:begin)
|> Grammar.loop(Grammar.Tokenizer.new("hello world"))

The DSL expansion has been rewritten to use that new API, reducing drastically the amount of generated code.

Also the parsing process stack is now bound to the depth of the grammar, and not to the parsed data.

Some livebooks are provided in the source code

Happy hacking,

Enjoy !

nmichel · December 5, 2024, 2:17pm

Hi !

Aaaaaaand Grammar v0.4.0 is in the wild !

This version enables parsing at the bit level. Thanks to the design decision of delegating tokens extraction to a protocol, this feature comes almost for free (I must confess I was pretty happy with that ).

As usual a livebook illustrates this “new” feature.

Happy hacking,

Nicolas -