Help to parse a template with NimbleParsec

egze · May 25, 2022, 7:10pm

I’m trying to create a template where you would enter this:

ID: {{ my_func($.project.id, "arg") }}, NAME:  {{ $.project.id }}

or

{{ my_func($.project.id, "arg") }}, NAME:  {{ $.project.id }}

I want to write a parser with NimbleParsec that gives me something like this:

{:ok, ["ID: ", [expr: [function: [my_func: ["$.project.id", "arg"]]]], ", NAME: ", [expr: "$.project.id"]]}

Basically, I need to identify opening and closing tags {{ and }}, and then inside them detect if there is a function call. If yes, parse the function call too for the name and arguments.
Function calls can also be nested my_other_func(my_func($.project.id, "arg")) and also sometimes there is no function call, but only a value.

The output doesn’t need to be exactly like mine, I just need to be able to tell things apart.

This is my first ever serious usage on NimbleParsec, and I’m not really sure how to approach it.

I get stuck on things like:

how to collect a string, until I detect a starting tag. But also how to make it optional.
how do I capture things between open and close tags, and then tag it as :expr.
how do I repeat it, until I reach the end of the string.

And I haven’t even got to parsing nested functions.

Can anyone be so kind and write down how do you start to tackle the problem?

fuelen · May 25, 2022, 10:05pm

Hello @edze

That’s a good exercise for my evening
Here is my parser:

defmodule TemplateEngine do
  defmodule Parser do
    import NimbleParsec

    optional_whitespaces = ascii_string(~c[ \t\n\r], min: 0)

    text =
      times(
        lookahead_not(string("{{"))
        |> utf8_char([]),
        min: 1
      )
      |> reduce({List, :to_string, []})
      |> unwrap_and_tag(:text)

    string_literal =
      ascii_char([?"])
      |> ignore()
      |> repeat(
        lookahead_not(ascii_char([?"]))
        |> choice([
          ~S(\") |> string() |> replace(?"),
          utf8_char([])
        ])
      )
      |> ignore(ascii_char([?"]))
      |> reduce({List, :to_string, []})
      |> unwrap_and_tag(:string_literal)

    variable =
      string("$")
      |> times(string(".") |> utf8_string([?a..?z, ?A..?Z, ?_, ?0..?9], min: 1), min: 1)
      |> reduce({Enum, :join, []})
      |> unwrap_and_tag(:variable)

    function_call =
      utf8_string([?a..?z, ?A..?Z, ?_, ?0..?9], min: 1)
      |> unwrap_and_tag(:name)
      |> ignore(string("("))
      |> tag(
        repeat(
          parsec(:expression)
          |> ignore(optional(string(",") |> ignore(optional_whitespaces)))
        ),
        :args
      )
      |> ignore(string(")"))
      |> tag(:function_call)

    defparsecp(
      :expression,
      choice([
        variable,
        function_call,
        string_literal
      ])
    )

    interpolation =
      ignore(
        string("{{")
        |> concat(optional_whitespaces)
      )
      |> parsec(:expression)
      |> ignore(
        optional_whitespaces
        |> string("}}")
      )
      |> unwrap_and_tag(:interpolation)

    defparsec(:parse, repeat(choice([interpolation, text])) |> eos())
  end

  def test do
    template = ~s|ID: {{ my_func($.project.id, "arg") }}, NAME:  {{ $.project.id }} {{ my_other_func(my_func($.project.id, "arg"))}}|
    __MODULE__.Parser.parse(template)
  end
end

and result:

iex> TemplateEngine.test()
{:ok,
 [
   text: "ID: ",
   interpolation: {:function_call,
    [name: "my_func", args: [variable: "$.project.id", string_literal: "arg"]]},
   text: ", NAME:  ",
   interpolation: {:variable, "$.project.id"},
   text: " ",
   interpolation: {:function_call,
    [
      name: "my_other_func",
      args: [
        function_call: [
          name: "my_func",
          args: [variable: "$.project.id", string_literal: "arg"]
        ]
      ]
    ]}
 ], "", %{}, {1, 0}, 114}

egze · May 26, 2022, 6:54am

Wow, a full working solution! Thank you so much!

Can you please comment on how did you start working on this parser? Like, do you just start from left to right, first try to extract the text and then just go along? Or you prepare smaller building pieces first? How do you plan it, in other words?

fuelen · May 26, 2022, 8:34am

Yes, I’ve started from left to right. While working on parser, I find it easier not to limit anything on the right side. Like, put eos() only when parser is ready. When working on function(...) put expectation for closing ) only when parsing arguments is ready.
Also, I have this helper for inspecting errors:

def inspect_error(result, input) do
  print_lines = fn
    [] -> :noop
    lines -> IO.puts([IO.ANSI.yellow(), Enum.intersperse(lines, "\n")])
  end

  case result do
    {:error, reason, _rest, _context, {line, offset}, byte_offset} ->
      {lines_with_error, lines_after_error} = input |> String.split("\n") |> Enum.split(line)

      {:ok, terminal_width} = :io.columns()

      cursor_position = byte_offset - offset
      {lines_before_error, [line_to_split]} = lines_with_error |> Enum.split(-1)
      chunks = line_to_split |> String.codepoints() |> Enum.chunk_every(terminal_width)
      number_of_chunk_with_error = div(cursor_position, terminal_width) + 1
      cursor_position_in_chunk = cursor_position |> rem(terminal_width)
      {chunks_with_error, chunks_without_error} = Enum.split(chunks, number_of_chunk_with_error)

      print_lines.(lines_before_error)
      print_lines.(chunks_with_error)
      IO.puts([IO.ANSI.red(), List.duplicate(" ", cursor_position_in_chunk), "^", reason])
      print_lines.(chunks_without_error)
      print_lines.(lines_after_error)

    _ ->
      :no_error
  end
end

usage:

template
|> __MODULE__.Parser.parse()
|> tap(&inspect_error(&1, template))

error messages are not so good as they could be, and writing a parser with good error messages is another art. But this helper allows to visually find the place where something goes wrong. I mean, try to remove " from the template near the "arg" and inspect the error.

egze · May 26, 2022, 9:03am

Really appreciate the answer and the snippet. I’m sure it will be useful to a lot of folks.

egze · May 26, 2022, 10:26am

@fuelen How would I go about working with unfinished templates? For example {{ $ that only has a start tag and a beginning of a variable. What I want to do is to see that we’re now in the interpolation->variable part, and offer autocomplete.

Would you suggest to have 2 separate parsecs, 1 for valid templates and 1 for incomplete where most of the rules are relaxed?

fuelen · May 26, 2022, 10:42am

I’m not sure if nimble parsec is the best tool for autocomplete on possibly invalid templates. I think you need a lexer (leex) and then using list of tokens and position of a cursor try to analyze what you can suggest for autocomplete

egze · May 26, 2022, 10:46am

Awesome. I was thinking that it would go in this direction Challenge accepted. Thanks again.