XPeg - Powerful Elixir PEG parser generator library

Hereby I’d like to announce a new library I’ve been working on over the last few months: XPeg.

XPeg is a pure Elixir pattern matching library. It provides macros to compile PEG grammars to an Elixir function which will parse a string and capture selected parts of the input. PEGs are not unlike regular expressions, but offer more power and flexibility, and have less ambiguities. More about PEGs on wikipedia.

Some use cases where XPeg is useful are configuration or data file parsers, robust protocol implementations, input validation, lexing of programming languages or domain specific languages.

XPeg allows mixing of the grammar definition with Elixir functions that act on the matched data, allowing for powerful and concise AST generating parsers.

Example

Below is a simple grammar definition that parses a comma separated list of key/value pairs into a list of tuples:

p = Xpeg.peg :dict do
  :dict <- :pair * star("," * :pair) * !1
  :pair <- :word * "=" * :number * fn [a,b | cs] -> [{b,a} | cs] end
  :word <- cap(+{'a'..'z'})
  :number <- cap(+{'0'..'9'}) * fn [v | cs] -> [String.to_integer(v) | cs] end
end

This grammar consists of the following rules:

  • The top level rule :dict matches one :pair, followed by zero-or-more instances of a , followed by a :pair
  • The :pair rule matches a :word followed by an = and a :number
  • The :word rule matches one-or-more characters from the set {'a'..'z'}
  • The :number rule matches one-or-more characters from the set {'0'..'9'}

Some rules are followed by elixir functions that convert or transform the captured data at parse time, resulting in the required AST syntax.

The grammar can be matched against the subject string using the Xpeg.match() function:

Xpeg.match(p, "grass=4,horse=1,star=2")

resulting in the following output:

[{"star", 2}, {"horse", 1}, {"grass", 4}]

Below are some links to more elaborate examples from the GitHub repository:

11 Likes

This looks really nice! I’m mostly just asking for conversation, but can you compare this with NimbleParsec? When would one prefer one over the other? Performance?

Good luck!

Honestly, I’m not at all acquainted with NimbleParsec, or any other available parsers for Elixir - I would have to dive into the alternatives to see how they would compare. One of the strengths of XPeg would be the concise way to build ASTs directly from the grammar, but other parsers might as well offer the same functionality. Performance is currently not great, as there is a number of possible optimizations that I have not yet implemented.

I initially wrote XPeg as an exercise to acquainted with Elixir metaprogramming, the implementation itself is basically a Elixir port of a (pretty much mature) Nim parser I made a a few years ago: NPeg, which is itself based on Lua’s LPeg.

Apologies for bumping my own thread, but I’d like to mention that Xpeg has learned some nice new tricks over the last few weeks. Most important changes:

  • Performance has been improved drastically; my typical benchmark is parsing JSON into Elixir maps and lists, which now runs at about half as fast as the highly optimized and blazing fast Poison parser.
  • Xpeg can now draw cool railroad graphs for the grammar that it is compiling, which is nice and helpful for understanding and debugging your grammars. for example:

This grammar fragment:

  Obj_pair <- S * String * S * ":" * Value                                                       
  Object <- "{" * (Obj_pair * star("," * Obj_pair) | S) * "}" 

Will be dumped like this railroad diagram at compile time:

                               ╭───────────»──────────╮                                            
Object o──'{'─»─┬─[Obj_pair]─»─┴─┬─","─»─[Obj_pair]─┬─┴──┬─»─"}"──o                                
                │                ╰─────────«────────╯    │                                         
                ╰─[S]────────────────────────────────────╯  

For more info, check the README on the Xpeg github repo

8 Likes