Implementing closures in a recursive descent parser using immutable datastructures

I’m currently following along the magnificent book Crafting Interpreters by Robert Nystrom. The author gradually builds up an interpreter from scratch. The first implementation (there are two implementations build throughout the book) is built with Java as the host language. The language being interpreted is a simplified version of lua including a few OO features (methods, classes, inheritance). The author calls this educational language “Lox”.

I’m trying to follow along doing the same exercise in Elixir. A lot of people have gone before me. There are even two implementations in Elixir. It’s a fun challenge, and I learned a lot already (most surprisingly about binary matching and protocols).

But I’m kinda stuck now. Until now I’ve had to translate solutions that were built using the OO features of Java into their immutable equivalent in Elixir. A good exercise! Every time state was being held as an instance field, I had to come up with a way to pass that same state around on the stack. The interpreter being built employs a strategy for parsing called recursive descent.

At this point I’m trying to implement local functions and closures. The concept of an “environment” has slowly evolved in the previous chapters. Each environment is associated with a scope or block and can have a chain of ancestors to allow lexical scoping. Until now I could pass an environment around, create new child-environments, or discard environments when leaving a scope. But closures make me scratch my head… Consider this fragment of Lox code:

fun makeCounter() {
  var i = 0;
  fun count() {
    i = i + 1;
    print i;
  }

  return count;
}

var counter = makeCounter();
counter(); // "1"
counter(); // "2"

When defining the inner count() function, its environment needs to be captured. But both calls to counter() need use and update the same captured environment. With the recursive descent strategy in an OO language, the captured environment of each function is being held as a reference that can be mutated when needed. That’s not something I can create an alternative for with immutable data structures. I can’t pass those captured environments through the stack, because they are associated with each function definition (as opposed to each function call).

I can come up with a solution that employs the Process dictionary as a form of mutable state. A similar solution could be built with spawning processes, an ets table or an Agent. But all these solutions feel much less elegant and more verbose than their OO equivalent. Another downside of these alternatives is that there won’t be any trigger to release allocated state in those environments once they become unreferenced. The OO solution can rely on the underlying garbage collector to clean up unreferenced heap allocated state.

My main question is if there are variations of the recursive descent parser strategy that play nice with functional languages that employ immutable data structures. Or if that’s just a bad fit to begin with, that would be also nice to know.

My humble attempt at crafting this interpreter can be found here: GitHub - linusdm/ex_lox: An Elixir implementation of an interpreter for the Lox language (from the book Crafting Interpreters by Robert Nystrom).

Cheers, and thanks for reading this far! :wave:

2 Likes

This may be a terrible idea, but would it work to add a copy of the local environment to the function’s data structure at creation and use it as the initial environment for the invocation?

1 Like

I believe this commit is heading in the wrong direction:

It’s 100% possible for a call to modify the current environment, for instance:

  var i = 0;
  fun count() {
    i = i + 1;
    print i;
  }

  count();
  count();

Having mutable environments in closures means that instead of a simple stack of environments, you’ve got a tree, where leaf nodes get garbage-collected once they are no longer referenced.

A function pointer like counter in your example holds a reference to its leaf node in the environment tree, and calling it can mutate either that leaf or any of the nodes between it and the root.

1 Like

Thanks for the nuance. Good point. I had to be explicit about which environment is modified, in the case of a call. The call-environment should not be changed, but the captured closure environment absolutely can. In your example, they happen to be the same (defining a function, and calling it immediately).

Good point. I’ve thought a lot about your remark, and about the structure of the nested environments, containing function pointers with references back to captured environments. This cyclical data structure is hard to update and maintain, given immutable building blocks.

Your remark has lead me to think about an alternative environment data structure, that still has these ancestor relationships between scopes. I didn’t come up with an idea yet, but I think it’s totally doable. It will have to simmer a bit longer…

That’s more or less what i do when “capturing the closure environment”, when a function is declared. But then later, when this captured environment is modified when running statements outside the function, these changes are not reflected in the closure environment.

It just occurs to me that circular dependencies are impossible with immutable data structures. So the whole idea of having an environment with a value referencing a function pointer that references back to that initial environment is impossible to start with (I didn’t notice it before).

For example, this program now results in an “undefined variable” error (instead of running forever), because the captured environment doesn’t have a reference to the function definition itself:

fun go() {
  go();
}

go();
1 Like