Best way to build a parser

jehrhardt · August 7, 2019, 7:04am

I want parse some text based format and I found two solutions for building a parser:

Use Erlang’s built in leex and yecc, which are easy to use in Elixir.
Use parser combinators like Nimble_parsec.

What is preferred way to parse text in Elixir? And why?

Fl4m3Ph03n1x · August 7, 2019, 8:01am

Afaik, there is no preferred way to parse text in Elixir.

I would say you should pick the tool that is both:

most mature and stable
better fits your use case

Erlang and Elixir go hand in hand. You can’t really be good at Elixir without knowing some Erlang basics - it’s everywhere.

So when the time comes to pick a tool, I’d suggest you to not focus on the language it is made in, but instead on the features and usability the tools provide.

Hope it helps!

david_ex · August 7, 2019, 8:16am

In my (pretty limited) experience, I would go with:

leex and yecc if your text is rigidly structured and follows some sore of grammar (e.g. it’s a DSL, config file, source code, etc.)
NimbleParsec if your text is semi-structured (e.g. records where the schema wasn’t enforced and there are deviations) and you want to extract structured information from it

Basically, if your “rules” have a bunch of “that’s not always true” cases (e.g. "an email address usually follows the person’s name, but sometimes there’s a phone number instead’) I would go with NimbleParsec because it’s easier to manage the complexity (by combining sub-parsers).

Of course, you can also handle the variety of corner cases in the grammar given to leex and yecc, but in my experience the grammar size explodes pretty quickly and makes it challenging to keep in your head.

Also, if your case is simple you could get away with using just Elixir’s pattern matching on strings ("foo " <> rest_of_string"), here’s a description of the idea https://pragdave.me/blog/2014/02/12/pattern-matching-and-parsing.html

jehrhardt · August 7, 2019, 9:51am

In my case, I want to parse some Python code to annotate it. Since there is a complete Python grammar available, I will go for leex and yecc.

Thanks for the explanation.

Qqwy · August 13, 2019, 2:40pm

@david_ex’s answer is a great and simple ‘Tl;Dr’ one. For parsing Python code, going the route of Leex/Yecc definitely makes sense.

If you want a more in-depth question/anwer about when to use Parser Combinators and when to use Parser Generators, then this was a question I dwelled on ~2.5 years ago for quite a significant amount of time. You might enjoy reading my StackOverflow Question + Answer on this matter.