Can you help me understand how to use lookahead?

I am trying to write a macro preprocessor using NimleParsec.

The idea is that it should parse a file replacing directives with a structure that can be used to post-process them. A directive looks like @<name>(…) for example @import(../libs/foo.bar).

I’m including what I have so far. The directive parsing works but, once the text cominator gets rolling there’s no stopping it. At this point I have tried using lookahead_not in every place and with every combinator I can think of and no dice.

I’ve read & re-read the docs around lookahead and something clearly is landing. Can anyone help me understand what I am missing?

Thanks.

Matt

defmodule MacroParser do
  import NimbleParsec

  dir_char = ascii_char([?@])
  open_paren = ascii_char([?(])
  close_paren = ascii_char([?)])
  directive_name = ascii_string([?a..?z], min: 2)

  # directive = @<name>(…<optional args>…)
  # e.g. @import("../libs/foo.bar")
  directive =
    ignore(dir_char)
    |> concat(directive_name)
    |> ignore(open_paren)
    |> optional(ascii_string([not: ?)], min: 1))
    |> ignore(close_paren)
    |> tag(:directive)

  text = ascii_string([], min: 1) |> lookahead_not(dir_char |> concat(directive_name))

  input =
    repeat(
      choice([
        directive,
        text
      ])
    )

  defparsec(:parse_macros, input, debug: false)
end
iex(159)> MacroParser.parse_macros("@import(../libs/foo.baz)___@import(../libs/bar.qux)")
{:ok,
 [{:directive, ["import", "../libs/foo.baz"]}, "___@import(../libs/bar.qux)"],
 "", %{}, {1, 0}, 51}
1 Like

Someone in Discord helped me with a solution:

defcombinatorp(
    :directive_start,
    dir_char |> concat(directive_name)
  )

input =
    repeat(
      choice([
        directive,
        times(
          lookahead_not(parsec(:directive_start))
          |> ascii_char([]), min: 1
        ) |> reduce({List, :to_string, []})
      ])
    )

This worked. Not quite sure why this worked where my attempts didn’t. Again, some kind of guide about applying lookahead from someone who understands it better than I would be great.

3 Likes

I can’t claim to understand how lookahead works, but here’s what I’ve found from playing around with it a bit. In the code you submitted originally, you can change the definition of text like below and the parser works:

  text =
    lookahead_not(dir_char |> concat(directive_name) |> tag(:directive_start))
    |> ascii_char([])
    |> times(min: 1)

From this we can infer:

  1. The lookahead has to be stated first.
  2. For some reason, dir_char |> concat(directive_name) is different from dir_char |> concat(directive_name) |> tag(:directive_start). (Just noticed you can also use wrap instead of tag)
  3. We have to scan for text using ascii_char and repeat that with times, instead of using ascii_string. I assume this is because ascii_string will only check the lookahead once. When it has determined the string does not start with the lookahead, it just parses the remaining input as a string.

Does that help you at all?

3 Likes