Elixir PEG Parser Generator

I’m looking for some pointers to make my simple code more idiomatic.

Please take a look



Ooh! This is super nice!

I might actually want to use this library, because the way its building blocks are combined looks really cool.

I think your code is already quite idiomatic:

  • I love how short all individual functions are; your layers of abstraction/concerns are very well separated.
  • Very good that the macros are so short; attempting to do too much work inside the direct function body of a macro is a common mistake (that I myself fell prey to too on more than one occasion :grin:), and macros are a lot harder to debug than functions.

The two small comments I can make:

  1. There’s a ‘TODO’ left halfway through your module; you do seem to have used the <> to match on text later on in the module, so I don’t think that one is still required.
  2. I wonder what happens if a user attempts to use your module improperly: Maybe you can add some extra checks to the ‘before_compile’ call that checks if @root is filled, and that there indeed are one or multiple rules. Also, a debugging call that prints all rules that were defined in a module would be really nice for debugging (and probably this could be used inside tests/doctests as well).
  3. More of a design decision thing: Is it possible to combine two modules that both contain rules in some way? That would be a great way to allow for libraries able to parse some common syntax to be re-used. 4. Also, there are many parsers that instead of erroring when they do not parse all input, return the parsed part an the unparsed rest of the input (which you can match on to be equal to "" if you really always want all input to be parsed).

A new version is here: https://github.com/NigelThorne/ElixirParslet

1 Like

You write:

Primarily I am working on this to learn Elixir.

Ok, that’s a noble motivation… But then:

In that endevor I want to make a version of this parser that uses GenServer and Actors to create a Parser Application from your code, so you can dispatch documents to it to parse.

Why do you need GenServers and Actors for a task that is extremeley sequential? Why an application? Why not a simple module?

I forsee rules each build an actor, so the document gets passed around and gradually consumed.

I don’t think this is how you should be using actors… It seems like the worst abstraction you could pick. And it will destroy your performance because of all the copying of data between processes (yes, actors are cheap in elixir, but communicating between them isn’t!)

Parsing a file in parallel is problematic. You should deal with parallelism in a coarse way. For example, spawning a process per document. Not sending parts of documents to the same process.

My tip for people who are learning elixir is the following: don’t write your own genservers unless you really know what you’re doing. They’re more dangerous than macros. Macros can make your source unreadable but after you compile your code they disappear! A bad genserver can kill your application’s performance at runtime…

By all means, learn elixir the language by writing a PEG parser generator BUT don’t learn OTP (genservers, processes, etc.) by writing a PEG parser generator.


Thanks for the feedback. That sounds like great advice. If my goal was to write the best PEG parser generator I can, I would definitely heed your advice.

As someone that wants to see how servers and actors work, I’m not bothered with performance; so I think I’ll give it a go anyway. What I’m interested in is how set up the actors. When to put in supervisors; when to have an actor acting like a facade or factory… I don’t think in ‘actors’ yet. I’d love to get some advice on how I should be thinking of breaking the problem down. (given it’s the wrong thing to do and I’m going to anyway)

I’m thinking that by creating a new ‘parser’ you are starting a parsermachine actor. You can tell this actor that there are new rules, and it generates new actors for each rule.
When running the parser and rules call into each-other, I’m not sure if they should pass messages directly to the other rule actor or back to the parent parsermachine who distributes to the right rule actor.

Should the parent actor be the supervisor of the other actors? It seems to be a parent child relationship… or is that better left to another actor, so there are single responsibilities per actor?

I’m not even sure how to draw what I’m talking about.


1 Like

I’ve added whitespace and unicode parsing to the example JSON parser… so it’s actually usable now. The concept of “Transformers” from Parslet was really really simple to do in Elixir. I practically needed no code. I’d love some feedback.

1 Like

A simple single-pass XML-like (elements, no attributes) parser would be awesome to see as it requires context sensitive parsing, so it would be nice to see how that is handled. :slight_smile:

My try at it with ex_spirit was:

As you know, you can’t do this with a pure PEG parser. I think that might be outside the scope of the current project.

Though depending on how it’s made it is possible, just depends on whether it internally carries state along or not (pretty simple overall). :slight_smile: