Local accumulators for cleaner comprehensions

josevalim · December 4, 2023, 4:42am

Hi everyone,

This is a proposal for introducing local accumulators to Elixir. This is another attempt of solving the comprehension problem, which we first discussed two years ago. This proposal is divided in four parts:

Problem statement
Comprehensions
Local accumulators (if you read the previous proposal, start from here!)
Revisiting the problem
Implementation considerations

Problem statement

I have been on the record a couple times saying that, while some problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the “nested-data-structures-traversal” repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it.

Personally speaking, I find imperative solutions clearer and with considerably less boilerplate. The Python solution is the most concise one, which I reproduce here:

section_counter = 1
lesson_counter = 1

for section in sections:
    if section["reset_lesson_position"]:
        lesson_counter = 1

    section["position"] = section_counter
    section_counter += 1

    for lesson in section["lessons"]:
        lesson["position"] = lesson_counter
        lesson_counter += 1

There are many things that make this solution clear:

Reassignment
Mutability
Sensitive whitespace

Let’s compare it with the Elixir solution I wrote and personally prefer. I am pasting an image below which highlights certain aspects:

Screenshot 2021-12-13 at 10 02 48

Lack of reassignment: in Elixir, we can’t reassign variables, we can only rebind them. The difference is, when you do var = some_value inside a if, for, etc, the value won’t “leak” to the outer scope. This implies two things in the snippet above:
1. We need to use Enum.map_reduce/3 and pass the state in and out (highlighted in red)
2. When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow)
Lack of mutability: even though we set the lesson counter inside the inner map_reduce, we still need to update the lesson inside the session (highlighted in green)
Lack of sensitive whitespace: we have two additional lines with end in them (highlighted in blue)

As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the reason why I brought it up is to make it clear they are not the source of verbosity, so we can confidentaly discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive.

However, there is still a gap to mind. So how can we move forward? Before we start on that, let’s learn what we can already do with comprehensions.

Comprehensions

Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartesian product between all points in x and y? You could write this:

Enum.flat_map(x, fn i ->
  Enum.map(y, fn j -> {i, j} end)
end)

Or with a comprehension:

for i <- x, j <- y, do: {i, j}

Or maybe you want to brute force your way into finding Pythagorean Triples?

Enum.flat_map(1..20, fn a ->
  Enum.flat_map(1..20, fn b ->
    1..20
    |> Enum.filter(fn c -> a*a + b*b == c*c end)
    |> Enum.map(fn c -> {a, b, c} end)
  end)
end)

Or with a comprehension:

for a <- 1..20,
    b <- 1..20,
    c <- 1..20,
    a*a + b*b == c*c,
    do: {a, b, c}

There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages). They support further options, such as :into, :uniq and :reduce.

Comprehensions start falling apart when you need to return additional values.

Let’s go back to our initial example. Imagine that you want to traverse a list of numbers, multiple each element in it by two while returning the sum of the original list at the same time.

In most non-functional programming languages, you might achieve this task like this:

sum = 0
list = []

for(element of [1, 2, 3]) {
  list.append(element * 2)
  sum += element
}

list /* [2, 4, 6] */
sum /* 6 */

This is quite different from how we have been doing things so far. In the example above, the for loop is changing the values of list and sum directly, which is then reflected in those variables once the loop is over.

Unfortunately, there is no mechanism to write this using for-comprehensions in Elixir. We need to fallback to map_reduce algorithms, going back to the problems described at the top of this post.

While you can use the :reduce in for-comprehensions to solve this problem, it has similar trade-offs to the Enum.map_reduce/3 solution presented here.

Local accumulators

This proposal offers to introduce local accumulators to Elixir. This aims to improve the proposed solution by introducing local reassignment to the language. As mentioned earlier, variables defined or rebound inside an inner scope do not affect the outer scope in Elixir:

value = 123

if true do
  value = 456
end

value #=> 123

With local accumulators, we can instead write:

@@value = 123

if true do
  @@value = 456
end

@@value #=> 456

With local accumualtors we could solve the problem above like this:

@@sum = 0

list =
  for element <- [1, 2, 3] do
    @@sum = element + @@sum
    element * 2
  end

list #=> [2, 4, 6]
@@sum #=> 6

Local accumulators are local because can only be reassigned within their scope. You can mutate them inside if/unless, for comprehensions, cond, receive, case… but you can’t pass them to other functions, not even anonymous functions. Therefore, you cannot write this:

@@sum = 0

Enum.map([1, 2, 3], fn x ->
  @@sum = x + @@sum
end)

In fact, the code above would fail to compile. Once we enter an anonymous function, any local accumulator becomes read-only. Passing a local accumulator to another function will only pass its current value.

In other words, the local accumulators are fully local and they never escape the current function, so their behaviour can never be observed outside of the function. Furthermore, Elixir will take care of compiling local accumulators to purely functional code!

Revisiting the problem

With all of this said, how we can solve the initial problem with local accumulators:

@@section_counter = 0
@@lesson_counter = 0

for section <- sections do
  if section["reset_lesson_position"] do
    @@lesson_counter = 0
  end

  @@section_counter = @@section_counter + 1

  lessons =
    for lesson <- section["lessons"] do
      @@lesson_counter = @@lesson_counter + 1
      Map.put(lesson, "position", @@lesson_counter)
    end

  section
  |> Map.put("lessons", lessons)
  |> Map.put("position", @@section_counter)
end

By adding reassignment to the language in the form of local accumulators, we can considerably reduce the amount of noise. We do this while still preserving the immutability semantics of data-structures and of all function callers. The only modification I have done to the solution is to start counting from zero, so we can store the incremented values in sessions and lessons.

I hope this provides some food for thought on this long-running discussion. Personally speaking, I find this solution gives the guarantees we expect from a functional language while providing a safe and clear alternative for those coming from imperative backgrounds.

Implementation considerations

This proposal uses the @@var notation to make it clear where local accumulators are used. @ is already used for module attributes, and there are similarties to them:

Module attributes are often used to accumulate values
When passing a module attribute or a local accumulators to a function, we simply pass the current value

However, they also have one large difference: module attributes are set using the @var :value notation, local accumulators use @@var = :value. When a module attribute is used inside a pattern, it is replaced by its contents. Local accumulators are reassigned.

There is also the possibility of confusion between them. Luckily, @@doc "foo" can raise a compilation error, which is a benefit of them having different write syntaxes. Furthermore, unused module attributes already warn and unused local accumulators shall warn too.

We should also consider the AST for local accumulators. @@var is already valid syntax today, and translates to {:@, _, [{:@, _, [{:var, _, nil}]}]}. Therefore, we could implement local accumulators today by pattern matching on the AST inside the implementation of module attributes. However, this has some downsides:

It will be easier to find and document if @ and @@ are distinct operators
@@ will most likely need to be a special form, as it has special mearning by the compiler, and module attributes are not special forms today

Therefore, introducing @@ as a separate token and its own AST node is a backwards incompatible change to the language. We must assess the impact of such change (for example, by emitting warnings in the tokenizer before the feature is introduced and by parsing all packages in Hex for occurrences of @@).

Alternatives syntaxes

One possible alternative is to use an operator to assign to local accumulators, such as:

session_counter <~ session_counter + 1

Unfortunately the operator introduces a few issues:

There are different ways we can assign/bind to a variable in Elixir, such as the = operator, case clauses, on the left side of <- in with clauses, etc
Patterns may mix both “regular” variables and local accumulators in them and the operator does not allow us to express which is which

These reasons indicate local accumulators should be a property of the variable name.

stevensonmt · December 4, 2023, 6:42am

I’ve gotten used to all the various workarounds to avoid mutability that I really don’t miss it with the current comprehensions in Elixir. That said this proposal looks really elegant and I’m sure I’ll abuse it when given the chance.

adw632 · December 4, 2023, 6:51am

I think the term “accumulator” is a misnomer as the proposal says @@ don’t accumulate, they only bind.

I don’t mind the concept, I just don’t like the association of @ with all of the module level use such module attributes and actual accumulation behaviour. I also find @@ kind of visually jarring, ugly code.

The following list might potentially confuse when used in a function and they look similar to their module counterparts:

@@type
@@opaque
@@typep
@@spec
@@callback
@@macrocallback
@@impl

It would also be problematic to blacklist the above as no doubt module level concerns may grow and change over time and there is a risk that some use of @@var today will clash with some future module usage.

If you are pinning one of these would the ^ come before @@ or after? ^@@foo or @@^bar?

I have to say that pinning with @@ makes for cryptic looking code.

Is there anything preventing another prefix such as !foo being used?

Can @@foo be used in function head pattern matching ?

josevalim · December 4, 2023, 7:27am

I thought the same but I don’t think it is a concern in practice, Because they are set differently, you need to make several mistakes in a row. You have to write @@doc = “my docs”, and even then, because you won’t read the local accumulator, you will still get an unused variable warnings.

Other than that, you should be able to use @@foo in a pattern and the syntax for pinning would be ^@@foo.

We can discuss other sigils instead of @, but that’s pretty much venturing into personal taste directory.

l3nz · December 4, 2023, 7:35am

I agree with you. @ is module, so kind-of the exact opposite of a local var.

christhekeele · December 4, 2023, 7:48am

What do you think about:

Having @@var :initial_value as a dedicated “declaration/instantiation” syntax
Having @@var = :updated_value be the dedicated “update/reassignment” syntax

The edge case I am thinking about is copying some block of code that declares and updates a @@var via @@var = :inner_initial, @@var = :inner_update and pasting it into a larger block of code (for, say, some refactoring reason) that also declares and relies on an identically named @@var. Based on the proposal as defined so far, that inner copied @@var = :inner_initial would unexpectedly start overriding the value in the outer scope.

This is a shadowing issue not possible in purely functional, re-assignment-supported languages like today’s Elixir. Having a dedicated syntax for declaration vs update allows us to detect a shadowing issue in mutable variables. I think this is part of the reason why languages with flexible declaration + assignment rules and mutable variables have stylistically adopted const/let declarations as best practice even when they are optional: it makes this sort of accidental shadowing much clearer by declaring the scope of a mutable var to start at a certain level of scoping, effectively re-initializing the accumulator to a known value in a nested scope and possibly issuing a compiler warning.

These local accumulators don’t make sense without declaring an initial value, and having a separate syntax for declaration and update lets us catch shadowing similarly to how we do today with immutable variable references. It also harmonizes with @var :intial_value where it is clear that re-declaring the module attribute (with the exception of accumulating module attributes, which is a whole other confusing conflation of the terminology we are using here) will be overridden later in the scope.

adw632 · December 4, 2023, 7:49am

The new proposal is quite sound IMO and essentially just enabling rebinding within a scope, (which I support).

I would think it is not sufficiently alien to warrant a lot of controversy, and whats left is the asthetics.

So I do agree that it is personal taste to some degree, not discounting that anything starting with @ has to date been a module level concern.

It is a shame we can’t reverse @@ to be module level and @ to be function level, but here we are.

adw632 · December 4, 2023, 8:12am

I do wonder what behaviour will follow over the years. If people actually start using if/else/for constructs in anger with assignments in those blocks this will promote more use of @@ (and visually cryptic looking code).

Currently we think of @@ as the exception, but it doesn’t take too much imagination where “normal coding standards” may evolve over the years to recommending making top level variables within a function all rebindable by default (via @@) so that if/else/for “just work” by default, relegating local scope variables to something you only do in a local code block.

This is one reason why I asked can @@ be used in function head pattern matching or do you need a separate declaration/binding to get an @@foo. Is there “ceremony” around declaring these vs other variables?

Then eventually some will ask but why can’t you use @@ in a function head? Isn’t it just a variable with “better” rebindability? Why is this not the default?

christhekeele · December 4, 2023, 9:34am

One virtue of this proposal’s semantics is that local accumulators are only assignable to in function bodies, and module attributes are only assignable to in module bodies. That means that pretty much any implementation of either can co-exist with any syntax, including emitting deprecation warnings for an eventual swap in a breaking Elixir 2.0 release.

LostKobrakai · December 4, 2023, 9:35am

This to me makes this proposal a no-go. Elixir in its early days had this behaviour with normal variables and it was eventually removed to align with the idea of “everything is an expression” and using the return of the if/else. Having the return value of if/else be its result is imo an (/one of many) example of what makes elixir code easier to understand and follow than other languages, which do not enforce that explicitness.

The fact that we now have to prefix variables with @@ imo doesn’t really change the fact that the whole snippet is now harder to understand. It might not look like it here, but imagine there being 20 more unrelated lines scattered between the lines shown here.

I do like the efforts of trying to improve the lesson counting example, something which indeed benefits from improvement, but I’m not a fan of adding new features, which can easily cause code to be made worse in other places – other places which I expect to be larger in quantity than the improved ones.

I’ve answered many peoples questions around how “if/else” works in elixir, where they tried to go the assign within the block route and found it not working. By now I explained that this is not how things go and that they’d need to change their approach. Going forward I’d need to explain that things don’t work like that 'ish and that they could use this local accumulators thing, which would work the way they’ve been coding before, but is not really the thing they want to use, because it can easily make things harder to understand in many places, …. My expectation would be that the requirement to change is the larger effort than just prefixing a bunch of variables with @@, so the latter will be what at least a bunch of beginners will be using.

Instead of “the improvement when needed” it might become the default approach taken for them.

One other issue I see with the proposals around this topic (including previous proposals) is the focus on for. I am a big fan of for and the conciseness it can provide. I also rarely use it – even though I’d love to do it more – because it’s so easy to introduce filtering of values, where filtering was not intended, but an error needed if a pattern couldn’t be matched for an element. In production code I’d rather be cautious and fail to process something instead of having elements be silently ignored on accident.

I’d be fine with a solution only updating how for works. It’s limiting, but a tradeoff to be made just like the tradeoff between nested Enum usage vs. a single for. But I’m not a fan of extending the changes to if/else, …, causing larger scale effects for all their use-cases, but excluding Enum APIs, where people will still run into “problem statement”.

josevalim · December 4, 2023, 9:41am

To clarify, @@ isn’t really function level. You can use it in any scope: inside an if branch, inside defmodule, or even at the root. Module attributes are definitely more common too, so I’d prefer the shorter one for them.

mat-hek · December 4, 2023, 9:41am

Cannot agree more

Aetherus · December 4, 2023, 9:48am

Just imagine that @@xxx is being reassigned all over the place! No lexical scope restriction! I already feel like I’m writing JavaScript

hubertlepicki · December 4, 2023, 9:49am

I am sorry, I am not a fan of this proposal, for several reasons.

Two reasons make it inconsistent with what we currently have:

it’s got a different scope than normal variables
it’s an actual mutable state rather than an immutable variable being renamed/redefined

But also, Elixir got big. It doesn’t have to be a big programming language, it doesn’t have to be Ruby. Clearly, it’s already bigger than Lisp or Erlang or Smalltalk, but being bigger and having more syntax makes it harder to learn, and harder to understand. We’re entering Python territory here and that’s not a compliment.

So that’s my biggest concern, the additional syntax, and overall growth of syntax and quirks and features of the language. At the end of the day, it’s function calls + AST-based macro system that makes Elixir great, and none of the additions, not even comprehensions are strictly speaking necessary.

michallepicki · December 4, 2023, 10:06am

The functional and immutable nature of Elixir grew on me and I appreciate how it lowers the mental overhead of working with code, even if it means the code needs to be a bit more verbose. But I agree that mutability makes some algorithms more concise, like shown in the Problem Statement.

I am curious, will this be implemented as syntax sugar for using the process dictionary?

josevalim · December 4, 2023, 10:17am

Folks, let’s please be more constructive and more respectful of other ecosystems. Yes, getting bigger is a concern, but I am sure we can formulate this in better ways.

It is also worth pointing out that in the last 5 years, there were only two changes to the language itself: stepped ranges and multi-letter sigils. We can go further back but I don’t think we will find much more. So I am not sure I agree the language is getting bigger by any meaningful rate. This proposal has a big impact though and would likely be the biggest change to the language since the introduction of with (roughly 7 or 8 years ago?).

It actually isn’t mutable state. Think of it as a more implicit version of the pipe operator where data is threaded in more complex ways. It doesn’t take away from your other concerns, but we should be clear it isn’t actually mutable.

No, it is based on source code transformation, all code is still pure/immutable.

hubertlepicki · December 4, 2023, 10:20am

I haven’t said anything disrespectful (or didn’t mean to) and it’s criticism of the size of the syntax is constructive and pretty specific by the way. Also: Python is growing like crazy in usage, despite that explosion of syntax, so there’s that to refute my claim of hurting adoption. Objectively, Python’s core syntax it grew a lot, if you had a gap of 15 years in programming Python it’s barely recognizable.

Aetherus · December 4, 2023, 10:23am

What I’m concerning about is that whenever there’s a hole in a system, there will always be someone to leverage that, and there always will be others following that first guy. I fear if there’s a hole in the immutable system, and it’s very easy to leverage, the situation will get out of control.

By the way, I’m the guy who leverages the process dictionary to do dynamic programming I don’t know if I’m the first.

l3nz · December 4, 2023, 10:25am

So they are, ahem, global? sounds like a huge gunfoot to me. A cure that’s worse than the disease.

If you copy and paste it wrong, you get a previous/different result that’s out of code scope but is in module? maybe they could be function or scope-prefixed so they cannot be mixed up.

josevalim · December 4, 2023, 10:28am

No, they are local to the scope they are defined, as the name suggests. Think about it like this: a variable in Elixir can be written in its current scope and read on all inner scopes, but not read or written by its parents. Local accumulators can be read and written by the current and inner scopes (but not its parents).

My point is that a defmodule is also a scope, as you can define variables in it (and local accumulators) in it:

defmodule Foo do
  barbaz = 123
end