NimbleParsec and formal grammars

Sebb · June 13, 2021, 2:23pm

I’d like to bring together NimbleParsec and the remnants of my knowledge of grammars.

NimbleParsec can obviously parse a regular language (what regular expressions can parse). But can it also parse context-free languages? For example this one (from wikipedia)

S → SS
S → (S)
S → ()

which produces all strings with balanced parentheses, like () or ((())).

When I have production rules (eg in BNF), are there clear steps to get a NimbleParsec-parser from them?

What about “mild” context sensitive features, like remembering the name of a xml-tag to decide if the closing tag matches?

josevalim · June 13, 2021, 4:42pm

It can by the use of post traversal and other similar functions. For example, see the XML example in the source repository.

Sebb · June 14, 2021, 8:02am

Its really all in the examples, stupid me just looked in hexdocs.
I have to say this is more fun than writing push-down-automata in Modula-3.

example how to do parentheses and also convert a production rule to a parser:

  # factor := ( expr ) | nat

  factor =
    empty()
    |> choice(
      [
        ignore(ascii_char([?(]))
        |> concat(parsec(:expr))
        |> ignore(ascii_char([?)])),
        nat
      ]
    )

example how to be a little sensitive.

  defp match_and_emit_tag(_rest, [tag, [tag, text]], context, _line, _offset), do: ...

  defp match_and_emit_tag(_rest, [opening, [closing | _]], _context, _line, _offset),
    do: {:error, "closing tag #{inspect(closing)} did not match opening tag #{inspect(opening)}"}

thats really cool. If i remember correctly this is a real pain in theory.