Koka-inspired local mutable variables for cleaner comprehensions

josevalim · December 2, 2023, 8:57am

UPDATE: This proposal has been retracted. Read the new proposal here: Local accumulators for cleaner comprehensions

Hi everyone,

This is a proposal for introducing local mutable variables to Elixir, inspired by the Koka programming language. This is another attempt of solving the comprehension problem, which we first discussed two years ago. This proposal is divided in four parts:

Problem statement
Comprehensions
Local mutable variables
Revisiting the problem

Problem statement

I have been on the record a couple times saying that, while some problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the “nested-data-structures-traversal” repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it.

Personally speaking, the most concise and clear solution is the Python one, which I reproduce here:

section_counter = 1
lesson_counter = 1

for section in sections:
    if section["reset_lesson_position"]:
        lesson_counter = 1

    section["position"] = section_counter
    section_counter += 1

    for lesson in section["lessons"]:
        lesson["position"] = lesson_counter
        lesson_counter += 1

There are many things that make this solution clear:

Reassignment
Mutability
Sensitive whitespace

Let’s compare it with the Elixir solution I wrote and personally prefer. I am pasting an image below which highlights certain aspects:

Screenshot 2021-12-13 at 10 02 48

Lack of reassignment: in Elixir, we can’t reassign variables, we can only rebind them. The difference is, when you do var = some_value inside a if, for, etc, the value won’t “leak” to the outer scope. This implies two things in the snippet above:
1. We need to use Enum.map_reduce/3 and pass the state in and out (highlighted in red)
2. When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow)
Lack of mutability: even though we set the lesson counter inside the inner map_reduce, we still need to update the lesson inside the session (highlighted in green)
Lack of sensitive whitespace: we have two additional lines with end in them (highlighted in blue)

As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the reason why I brought it up is to make it clear they are not the source of verbosity, so we can confidentaly discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive.

However, there is still a gap to mind. So how can we move forward?

Comprehensions

Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartesian product between all points in x and y? You could write this:

Enum.flat_map(x, fn i ->
  Enum.map(y, fn j -> {i, j} end)
end)

Or with a comprehension:

for i <- x, j <- y, do: {i, j}

Or maybe you want to brute force your way into finding Pythagorean Triples?

Enum.flat_map(1..20, fn a ->
  Enum.flat_map(1..20, fn b ->
    1..20
    |> Enum.filter(fn c -> a*a + b*b == c*c end)
    |> Enum.map(fn c -> {a, b, c} end)
  end)
end)

Or with a comprehension:

for a <- 1..20,
    b <- 1..20,
    c <- 1..20,
    a*a + b*b == c*c,
    do: {a, b, c}

There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages).

However, a common point that arose during the discussion of how to solve this problem using comprehensions is that comprehensions are currently under utilized in Elixir. To address that, let’s take a deeper look into what comprehensions provide.

Generators

Let’s start with a simple problem. You have a list of numbers and you want to multiply each element in the list by two. We can do this:

iex> for i <- [1, 2, 3] do
...>   i * 2
...> end
[2, 4, 6]

The part i <- [1, 2, 3] is a generator. It gets each value in the list [1, 2, 3] and binds them to the variable i one at a time. Once i is bound, it executes the contents of the do-end block. The new list is formed by the results of the do-end block.

A comprehension can have multiple generators too. One use of multiple generators is to find all possible combinations between two lists. Imagine for example you are interested in a new car. You have identifier three colors that you like: green, blue, and yellow. You are also divided between three brands: Ford, Volkswagen, and Toyota. What are all combinations available?

Let’s first define variables:

iex> colors = [:green, :blue, :yellow]
iex> cars = [:ford, :volkswagen, :toyota]

Now let’s find the combinations:

iex> for color <- colors, car <- cars do
...>   "#{color} #{car}"
...> end
["green ford", "green volkswagen", "green toyota", "blue ford",
 "blue volkswagen", "blue toyota", "yellow ford", "yellow volkswagen",
 "yellow toyota"]

By having two generators, we were able to combine all options into strings.

Multiple generators are also useful to extract all possible values that are nested within other colors. Imagine that you have a list of users and their favorite programming languages:

iex> users = [
...>   %{
...>     name: "John",
...>     languages: ["JavaScript", "Elixir"]
...>   },
...>   %{
...>     name: "Mary",
...>     languages: ["Erlang", "Haskell", "Elixir"]
...>   }
...> ]

If we want to get all languages from all users, we could use two generators. One to traverse all users and another to traverse all languages:

iex> for user <- users, language <- user.languages do
...>   language
...> end
["JavaScript", "Elixir", "Erlang", "Haskell", "Elixir"]

The comprehension worked as if it retrieved the languages lists of all users and flattened it into a list, with no nesting.

The important concept about for-comprehensions so far is that we never use them to mutate values. Instead, we explicitly use them to explicitly map inputs to outputs: the lists that we want to traverse are given as inputs and for returns a new list as output, based on the values returned by the do-end block.

The `:uniq` option

In the example above, you may be wondering: what if we want all languages from all users but with no duplicates? You are in lucky, comprehensions also accept options, one of them being :uniq:

iex> for user <- users, language <- user.languages, uniq: true do
...>   language
...> end
["JavaScript", "Elixir", "Erlang", "Haskell"]

Comprehension options are always given as the last argument of for, just before the do keyword.

Filters

So far we used comprehensions to map inputs to outputs, to generate combinations, or to flatten lists nested inside other lists. We can also use comprehensions to filter the input, keeping only the entries that match a certain condition. For example, imagine we have a list of positive and negative numbers, and we want to keep only the positive ones and then multiply them by two:

iex> for i <- [-5, -3, -2, 1, 2, 4, 8], i > 0 do
...>   i * 2
...> end
[2, 4, 8, 16]

Filters are given as part of the comprehension arguments. If the filter returns a truthy value (anything except false and nil), the comprehension continues. Otherwise it skips to the next value.

You can give as many filters as you want, including mixed with other generators. Let’s go back to our users example and add some arbitrary rules. Imagine that we only want to consider programming languages from users that have the letter “a” in their name:

iex> for user <- users, String.contains?(user.name, "a"), language <- user.languages do
...>   language
...> end
["Erlang", "Haskell", "Elixir"]

As you can see, due to the filter, we skipped John’s languages.

What if we want only the programming languages that start with the letter “E”?

iex> for user <- users, language <- user.languages, String.starts_with?(language, "E") do
...>   language
...> end
["Elixir", "Erlang", "Elixir"]

Now we got languages from both, including the duplicates, but returned only the ones starting with “E”. You can still use the :uniq option, give it a try!

Where it falls apart…

Comprehensions start falling apart when you need to return additional values.

Let’s go back to our initial example. Imagine that you want to traverse a list of numbers, multiple each element in it by two while returning the sum of the original list at the same time.

In most non-functional programming languages, you might achieve this task like this:

sum = 0
list = []

for(element of [1, 2, 3]) {
  list.append(element * 2)
  sum += element
}

list /* [2, 4, 6] */
sum /* 6 */

This is quite different from how we have been doing things so far. In the example above, the for loop is changing the values of list and sum directly, which is then reflected in those variables once the loop is over.

Unfortunately, there is no mechanism to write this using for-comprehensions in Elixir. We need to fallback to map_reduce or reduce algorithms, going back to the problems described at the top of this post.

Local mutable variables

Koka is a very interesting programming language that introduces mutability in places, either for performance or clarity, without sacrificing functional principles. The idea is that all mutability should be local to a function, so for all users of said functions it still behaves like immutable functional code.

With mutable functions, we could solve the problem above like this:

mut sum = 0

list =
  for element <- [1, 2, 3] do
    sum = element + sum
    element * 2
  end

list #=> [2, 4, 6]
sum #=> 6

Local mutable variables can only be mutated within their scope. You can mutate them inside if/unless, for comprehensions, cond, receive… but that’s all. So you could write code like this, if you want to:

mut list = []

if some_value? do
  list = ["prefix" | list]
end

But you cannot write code like this, since anonymous functions escape the local function:

mut sum = 0

Enum.map([1, 2, 3], fn x ->
  sum = x + sum
end)

In fact, the code above would fail to compile. Once we enter an anonymous function, any mutable variable becomes tainted, and attempting to reassign it leads to crashes. Passing a mutable variable to another function will only pass its current value and it is no longer mutable.

In other words, the mutation is fully local and it never escapes the current function, so the mutability can never be observed outside of the function. Furthermore, Elixir will take care of compiling mutable variables to purely functional code, and not even the implementation itself will rely on mutability.

Revisiting the problem

With all of this said, how we can solve the initial problem with local mutable variables:

mut section_counter = 0
mut lesson_counter = 0

for section <- sections do
  if section["reset_lesson_position"] do
    lesson_counter = 0
  end

  section_counter = section_counter + 1

  lessons =
    for lesson <- section["lessons"] do
      lesson_counter = lesson_counter + 1
      Map.put(lesson, "position", lesson_counter)
    end

  section
  |> Map.put("lessons", lessons)
  |> Map.put("position", section_counter)
end

By adding reassignment to the language in the form of local mutable variables, we can considerably reduce the amount of noise. We do this while still preserving the immutability semantics of data-structures and of all function callers. The only modification I have done to the solution is to start counting from zero, so we can store the incremented values in sessions and lessons.

I hope this provides some food for thought on this long-running discussion. Personally speaking, I find this solution gives the guarantees we expect from a functional language while providing a safe and clear alternative for those coming from imperative backgrounds.

tcoopman · December 2, 2023, 9:03am

I get a 404 on the repository nested data structures

josevalim · December 2, 2023, 9:06am

Fixed, thanks!

tcoopman · December 2, 2023, 9:41am

I definitely think that mutability can help express certain algorithms better and probably more performant as well.

If we could add function-local mutable variables that also improve performance I would use it.

That being said, I don’t miss mutability often, and it’s only a few times that I would reach for it, if I could.
So if the cost of implementation is too high, or if there are other downsides, then I wouldn’t personally bother (but I’ve been doing functional programming for a long while now, so maybe for new people, this would be more useful)

hauleth · December 2, 2023, 9:48am

Why cannot solve this “problem” using some macro magic?

defmodule Mut do
  defmacro mut({:=, _, [name, value]}) do
    quote do
      unquote(name) = make_ref()
      Process.put(unquote(name), unquote(value))
    end
  end

  def set!(name, value) do
    Process.put(name, value)
  end

  def get!(name), do: Process.get(name)
end

And now the usage is like:

mut sum = 0

for i <- 1..10, do: set!(sum, get!(sum) + i)

IO.inspect(get!(sum), label: :sum)

Syntax could be polished a little, but the general idea is here. Obviously it will not work cross process, but shared mutable state is root of all evil.

gpopides · December 2, 2023, 10:35am

Not familiar with Koka but i suppose it is the same/similar with ocaml’s ref

Can functions return mutable variables?

def func() do
  mut variable = 0

  variable
end
#

result = func()

What would be the result here?

bartblast · December 2, 2023, 10:47am

If local mutable variables are introduced, perhaps a naming convention could enhance the distinction of such variables in the code. For instance:

!sum = !sum + x

This approach, like using an exclamation mark before the variable name, provides a clear visual cue for mutability without the necessity of introducing a “mut” keyword during variable declaration. I don’t know whether it’s feasible AST-wise, though…

josevalim · December 2, 2023, 11:36am

The problem with this approach is that you are effectively introducing mutability and, without compiler guarantees, it can easily escape the context of a function. I’d say it is very important to avoid that and keep the reassignment local. The mutability should not be observable at runtime, not even via introspection.

They can, but they only return the last value. The mutability itself is restricted to the function.

Sure, we can discuss syntax later. Focusing on the ideas and semantics is more important for now.

cmo · December 2, 2023, 11:51am

I wrote F# briefly before Elixir and it allows mutable variables. I don’t recall if the local scope is enforced but it was at least the guidance. They do have a different syntax, using <- for updates to a mutable variable vs = for regular assignment.

There have certainly been times where I would have used it in Elixir if it was a language feature.

al2o3cr · December 2, 2023, 1:54pm

I’m interested by the idea, but I suspect that this restriction (sensible though it is) would be a support headache. We already regularly get folks who are new to the language trying to do this (except with Enum.each) and “= is rebinding not reassignment” is easier to explain than “= is rebinding not reassignment except…”.

Using a different operator might help, since comprehensions already assign special meaning to tokens like <- that a reader needs to understand.

thiagomajesk · December 2, 2023, 2:37pm

To me at least, introducing the concept of mutability breaks the mental model I have a little bit, but since Elixir was my first and only function programming language; this might be completely acceptable.

I usually find elixir so consistent that I never find such edge cases that make me reason about the code differently than what I expect and I’m always wary about semantical changes like this one. It’s exactly the aforementioned “it works like this, except…” situation.

I agree that if we want to have this type of mutability, we should probably choose another operator to represent this and make the distinction even clearer. I think <- could be a good choice as we already have it in the language and we can “tweak” its meaning to satisfy conceptual consistency. But then, someone might expect that this would make sense: for mut a <- [1...2]. We have to think well about the syntax changes and how they serve the big picture.

One possible alternative could be using a special mut/1 function for actually mutating the value instead of a completely new keyword: mut(x + sum), but then it could be more verbose and not that clear which variables in the scope are expected to be mutated.

PS.: I haven’t coded in Haskell and F#, but it seems other functional languages also support this type of concept. Would love it if someone could point me to some resources where I can read a little bit about this. I honestly don’t have a lot of functional programming experience other than elixir.

adw632 · December 2, 2023, 3:10pm

I don’t really like introducing general mutability within functions, as others have already said, it creates the issue of “elixir is immutable except…”

Would it instead be possible to introduce some kind of “pinning operator” for comprehensions so that within the scope of the comprehension any assignment/binding of values is actually binding the variable on the outer scope, rather than wholesale mutability within functions?

Thinking out loud:

section_counter = 0
lesson_counter = 0

for section <- sections, ^section_counter, ^lesson_counter do
  if section["reset_lesson_position"] do
    lesson_counter = 0
  end

  section_counter = section_counter + 1

  lessons =
    for lesson <- section["lessons"], ^lesson_counter do
      lesson_counter = lesson_counter + 1
      Map.put(lesson, "position", lesson_counter)
    end

  section
  |> Map.put("lessons", lessons)
  |> Map.put("position", section_counter)
end

thiagomajesk · December 2, 2023, 3:19pm

Very interesting idea you have there… This always reminds me of the usage of ^ within Ecto queries. The problem I have with syntax changes is when we have different meanings for different contexts, so would be nice if we could create a consistent meaning for the usage of this operator (and it seems we already assign this meaning to it like you suggested, see here.

mudasobwa · December 2, 2023, 5:07pm

Well, this particular example can be easily written with :reduce

{list, sum} =
  for elem <- [1, 2, 3], reduce: {[], 0},
    do: ({list, sum} -> {[elem*2|list], elem+sum})
#⇒ {[6, 4, 2], 6}

The counter within a nested “loop” might be (arguably not elegant) achieved by wrapping the outer enumerable into Enum.with_index/1 and keeping this index in the accumulator given reset_lesson_position = index == current_index.

That said, I frankly do not see any noticeable advantage to be traded for breaking one of the main principles of the languages. Even rebinding gave birth to the countless number of discussions, this one would spread the whole picture for newcomers even more. Maybe it’s just me, but I find immutability nice even though sometimes one cannot write a pythonesque code with it.

dimitarvp · December 2, 2023, 5:51pm

I feel that a more realistic pain point should be demonstrated before the language gains scoped mutability (which will only add confusion as others have said and I agree).

That immutability makes certain algorithms impractical is known and somewhat accepted but I’d think this is solved by having Erlang BIFs and C / C++ / Rust / Zig NIFs.

IMO not worth it, it will confuse people for a benefit that’s not well advocated for.

Can we get an example that demonstrates the need for this feature i.e. “this crucially important Elixir code would be forever needlessly slow if we don’t do this”?

joey_the_snake · December 2, 2023, 6:41pm

The other points are valid but I personally like it. I’ve had to reduce over nested structures over the past year and the readability still hasn’t really become second nature.

bit4bit · December 2, 2023, 8:28pm

Hi,
I agree with the idea of having a temporal state for comprehensions; it helps to express the solution more clearly. However, having another thing to consider when we code doesn’t sound very nice. Also, it can be difficult to trace the absence of the keyword.

What if we can express the same idea (temporal state for comprehensions) without introducing new keywords?

for example, something like

for section <- sections, state: {lesson_counter \\ 0, section_counter \\ 0} do
  if section["reset_lesson_position"] do
    lesson_counter = 0
  end

  section_counter = section_counter + 1

  lessons =
    for lesson <- section["lessons"] do
      lesson_counter = lesson_counter + 1
      Map.put(lesson, "position", lesson_counter)
    end

  section = section
  |> Map.put("lessons", lessons)
  |> Map.put("position", section_counter)
  
  {:next, section, {lesson_counter, section_counter}}
end

or maybe something like quote

for section <- sections, state: [lesson_counter: 0, section_counter: 0] do
  if section["reset_lesson_position"] do
    lesson_counter = 0
  end

  section_counter = section_counter + 1

  lessons =
    for lesson <- section["lessons"] do
      lesson_counter = lesson_counter + 1
      Map.put(lesson, "position", lesson_counter)
    end

  section
  |> Map.put("lessons", lessons)
  |> Map.put("position", section_counter)
end

josevalim · December 2, 2023, 9:38pm

I am not worried about people reassigning inside an anonymous function being a support headache because we will explicitly raise a clear error message. I’d say it would even lead to a better user experience than regular variables today, because we can detect it and explain why it is not possible.

Because we still compile to immutable code, there is no such example. This is purely a syntactical construct with no runtime changes. So in a way, the name proposed here is actually very misleading, because we don’t actually have mutability.

Perhaps a better comparison would be to JavaScript: consider that all variables in Elixir are defined with let and here we are defining them with var (actually, a limited form of var). Another name we could give to this is “local accumulators”. Perhaps it would be more generally acceptable if we don’t include the name “mutable” in them (as they aren’t mutable).

sodapopcan · December 2, 2023, 10:21pm

heavy-plus-sign emoji to “local accumulators” terminology.

I know you don’t want to talk about syntax but I do like where @adw632 is going with the “pinning” idea. This way you wouldn’t have to “open” a scope to all of its descendants.

list = []

if true, ^list, do: [:ok | list]

if true, do: [:not_ok | list] # error

EDIT: You probably get this spirit of this but of course the second if wouldn’t be an error, it just wouldn’t “mutate” the outer list binding.

Eiji · December 2, 2023, 10:25pm

if that would work only for those as in that case I would expect to see a new operator which would be part of those constructs instead. I simply don’t like that mut variable is called outside of said for definition which allows people to put between them many lines unrelated to said variable.

with that type of syntax if it would work within all code blocks (do, else, after and so on …). First of all this makes it simpler to remember comparing to some list of constructs. Secondly it would allow to write a custom macros with do … end blocks.