Hi everyone,
This is a proposal for introducing local accumulators to Elixir. This is another attempt of solving the comprehension problem, which we first discussed two years ago. This proposal is divided in four parts:
-
Problem statement
-
Comprehensions
-
Local accumulators (if you read the previous proposal, start from here!)
-
Revisiting the problem
-
Implementation considerations
Problem statement
I have been on the record a couple times saying that, while some problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the “nested-data-structures-traversal” repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it.
Personally speaking, I find imperative solutions clearer and with considerably less boilerplate. The Python solution is the most concise one, which I reproduce here:
section_counter = 1
lesson_counter = 1
for section in sections:
if section["reset_lesson_position"]:
lesson_counter = 1
section["position"] = section_counter
section_counter += 1
for lesson in section["lessons"]:
lesson["position"] = lesson_counter
lesson_counter += 1
There are many things that make this solution clear:
- Reassignment
- Mutability
- Sensitive whitespace
Let’s compare it with the Elixir solution I wrote and personally prefer. I am pasting an image below which highlights certain aspects:
-
Lack of reassignment: in Elixir, we can’t reassign variables, we can only rebind them. The difference is, when you do
var = some_value
inside aif
,for
, etc, the value won’t “leak” to the outer scope. This implies two things in the snippet above:- We need to use
Enum.map_reduce/3
and pass the state in and out (highlighted in red) - When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow)
- We need to use
-
Lack of mutability: even though we set the lesson counter inside the inner
map_reduce
, we still need to update the lesson inside the session (highlighted in green) -
Lack of sensitive whitespace: we have two additional lines with
end
in them (highlighted in blue)
As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the reason why I brought it up is to make it clear they are not the source of verbosity, so we can confidentaly discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive.
However, there is still a gap to mind. So how can we move forward? Before we start on that, let’s learn what we can already do with comprehensions.
Comprehensions
Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartesian product between all points in x
and y
? You could write this:
Enum.flat_map(x, fn i ->
Enum.map(y, fn j -> {i, j} end)
end)
Or with a comprehension:
for i <- x, j <- y, do: {i, j}
Or maybe you want to brute force your way into finding Pythagorean Triples?
Enum.flat_map(1..20, fn a ->
Enum.flat_map(1..20, fn b ->
1..20
|> Enum.filter(fn c -> a*a + b*b == c*c end)
|> Enum.map(fn c -> {a, b, c} end)
end)
end)
Or with a comprehension:
for a <- 1..20,
b <- 1..20,
c <- 1..20,
a*a + b*b == c*c,
do: {a, b, c}
There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages). They support further options, such as :into
, :uniq
and :reduce
.
Comprehensions start falling apart when you need to return additional values.
Let’s go back to our initial example. Imagine that you want to traverse a list of numbers, multiple each element in it by two while returning the sum of the original list at the same time.
In most non-functional programming languages, you might achieve this task like this:
sum = 0
list = []
for(element of [1, 2, 3]) {
list.append(element * 2)
sum += element
}
list /* [2, 4, 6] */
sum /* 6 */
This is quite different from how we have been doing things so far. In the example above, the for
loop is changing the values of list
and sum
directly, which is then reflected in those variables once the loop is over.
Unfortunately, there is no mechanism to write this using for-comprehensions in Elixir. We need to fallback to map_reduce
algorithms, going back to the problems described at the top of this post.
While you can use the
:reduce
in for-comprehensions to solve this problem, it has similar trade-offs to theEnum.map_reduce/3
solution presented here.
Local accumulators
This proposal offers to introduce local accumulators to Elixir. This aims to improve the proposed solution by introducing local reassignment to the language. As mentioned earlier, variables defined or rebound inside an inner scope do not affect the outer scope in Elixir:
value = 123
if true do
value = 456
end
value #=> 123
With local accumulators, we can instead write:
@@value = 123
if true do
@@value = 456
end
@@value #=> 456
With local accumualtors we could solve the problem above like this:
@@sum = 0
list =
for element <- [1, 2, 3] do
@@sum = element + @@sum
element * 2
end
list #=> [2, 4, 6]
@@sum #=> 6
Local accumulators are local because can only be reassigned within their scope. You can mutate them inside if
/unless
, for
comprehensions, cond
, receive
, case
… but you can’t pass them to other functions, not even anonymous functions. Therefore, you cannot write this:
@@sum = 0
Enum.map([1, 2, 3], fn x ->
@@sum = x + @@sum
end)
In fact, the code above would fail to compile. Once we enter an anonymous function, any local accumulator becomes read-only. Passing a local accumulator to another function will only pass its current value.
In other words, the local accumulators are fully local and they never escape the current function, so their behaviour can never be observed outside of the function. Furthermore, Elixir will take care of compiling local accumulators to purely functional code!
Revisiting the problem
With all of this said, how we can solve the initial problem with local accumulators:
@@section_counter = 0
@@lesson_counter = 0
for section <- sections do
if section["reset_lesson_position"] do
@@lesson_counter = 0
end
@@section_counter = @@section_counter + 1
lessons =
for lesson <- section["lessons"] do
@@lesson_counter = @@lesson_counter + 1
Map.put(lesson, "position", @@lesson_counter)
end
section
|> Map.put("lessons", lessons)
|> Map.put("position", @@section_counter)
end
By adding reassignment to the language in the form of local accumulators, we can considerably reduce the amount of noise. We do this while still preserving the immutability semantics of data-structures and of all function callers. The only modification I have done to the solution is to start counting from zero, so we can store the incremented values in sessions and lessons.
I hope this provides some food for thought on this long-running discussion. Personally speaking, I find this solution gives the guarantees we expect from a functional language while providing a safe and clear alternative for those coming from imperative backgrounds.
Implementation considerations
This proposal uses the @@var
notation to make it clear where local accumulators are used. @
is already used for module attributes, and there are similarties to them:
-
Module attributes are often used to accumulate values
-
When passing a module attribute or a local accumulators to a function, we simply pass the current value
However, they also have one large difference: module attributes are set using the @var :value
notation, local accumulators use @@var = :value
. When a module attribute is used inside a pattern, it is replaced by its contents. Local accumulators are reassigned.
There is also the possibility of confusion between them. Luckily, @@doc "foo"
can raise a compilation error, which is a benefit of them having different write syntaxes. Furthermore, unused module attributes already warn and unused local accumulators shall warn too.
We should also consider the AST for local accumulators. @@var
is already valid syntax today, and translates to {:@, _, [{:@, _, [{:var, _, nil}]}]}
. Therefore, we could implement local accumulators today by pattern matching on the AST inside the implementation of module attributes. However, this has some downsides:
-
It will be easier to find and document if
@
and@@
are distinct operators -
@@
will most likely need to be a special form, as it has special mearning by the compiler, and module attributes are not special forms today
Therefore, introducing @@
as a separate token and its own AST node is a backwards incompatible change to the language. We must assess the impact of such change (for example, by emitting warnings in the tokenizer before the feature is introduced and by parsing all packages in Hex for occurrences of @@).
Alternatives syntaxes
One possible alternative is to use an operator to assign to local accumulators, such as:
session_counter <~ session_counter + 1
Unfortunately the operator introduces a few issues:
-
There are different ways we can assign/bind to a variable in Elixir, such as the
=
operator,case
clauses, on the left side of<-
inwith
clauses, etc -
Patterns may mix both “regular” variables and local accumulators in them and the operator does not allow us to express which is which
These reasons indicate local accumulators should be a property of the variable name.