How do I avoid infinite recursion in a parser combinator

mattmower · September 19, 2021, 8:24am

After watching Saša Jurić’s talk on the subject I built my own parser combinator library, Ergo, as a learning experience.

It’s reached a point of maturity that it’s worked quite well for the parsing tasks I’ve thrown at it, but now I’ve hit a problem I am struggling to think my way around.

Let’s say I have a parser for a value:

def value() do
  choice([
    number_value(),
    boolean_value(),
    string_value()
  ])
)

and now I want to deal with lists (I’m omitting to deal with whitespace for brevity):

def list_value() do
  sequence([
    char(?[),
    value(),
    many(
      sequence([
        char(?,)
        value()
      ])
    ),
    char(?])
  ])
)

def value() do
  choice([
    number_value(),
    boolean_value(),
    string_value(),
    list_value()
  ])
end

and now I have a problem in that list_value() and value() are mutually, infinitely, recursive.

I’m about 25 years from my compiler class so I am very hazy about anything from here onwards. I don’t think this is a left-recursive grammar issue because you consume the [ token (i.e. this is not A -> A𝛼|β). I think this is an artefact of building the parsers using function calls.

I’ve bent my brain around this problem and come up with nothing except that possibly this is one reason that NimbleParsec and ExSpirit use macros, rather than plain functions, so that recursion is possible.

Can anyone help me figure out what to do here?

Many thanks in advance.

mattmower · September 19, 2021, 2:50pm

The answer came from Frerich on the #Elixir slack.

I’d gotten too far into think about the parsing aspect of the problem and didn’t see the purely functional aspect - eager vs lazy evaluation.

The solution in the end is simple, introduce an intermediate parser that sits between value() and list()

My first use of a macro in Elixir:

defmacro lazy(parser) do
    quote do
      Parser.new(
        :lazy,
        fn ctx -> Parser.invoke(unquote(parser), ctx) end
      )
    end
  end

Now I can write:

def value() do
  choice([
    number_value(),
    boolean_value(),
    string_value(),
    lazy(list_value())
  ])
)

And the recursion is broken.

Qqwy · September 19, 2021, 8:00pm

Congradulations on writing your first macro!

Indeed, this is one of the places where lazy evaluation makes an algorithm simpler.
Situations like this, where infinite recursion happens while building an intermediate structure for things that ‘may’ exist to an unknown (but finite) depth, come up from time to time.

The problem definitely is related to left-recursion. One could see it as left-recursion happening already before starting the parsing process itself.

In this particular example, you might be able to get away with a definition of lazy as a function (rather than a macro) like this:

def lazy(parser_producing_fun) do
  Parser.new(:lazy, fn cx -> Parser.invoke(parser_producing_fun.(), ctx) end
end

However, you’d need to call it as lazy(&list_value/0 instead of lazy(list_value()) so it alters the way it has to be used a little. Your choice

IloSophiep · September 19, 2021, 8:45pm

I recently finally took the time to watch that video myself.

I am wondering if I’m confusing stuff when I’m thinking that Saśa Jurić touches on the topic in the video already - starting at around 49:57min he talks about the problem of a subquery being its own select_statement() parser function.

He ends up evaluating the term lazily, just like you ended up doing, so I think it is the same situation you were dealing with?

sasajuric · September 19, 2021, 8:54pm

I came here to say exactly this The relevant part of the talk starts here. In the talk I opted for the function version (basically what @Qqwy suggests in the earlier response).

mattmower · September 20, 2021, 7:01am

Ah, it is a while since I watched the video and I forgot that Saša covered the problem in it. Thanks anyway! Would have saved me a lot of bother if I had remembered that!

I did think about the function approach as an alternative but (esp in a case where the deferred parser may take arguments) I thought the macro was likely to be cleaner at a cost of being slightly less clear that something unusual is going on.

Perhaps I might change the name to LAZY or lazy! to make it clearer there’s something to take note of.

mattmower · September 20, 2021, 7:16am

Yes, it’s kind of a left-recursion at the Elixir level. In my Ergo docs I’m calling this “Eager recursion” to distinguish from left-recursion in the grammar itself.