UPDATE: This proposal has been retracted. Read the new proposal here: Local accumulators for cleaner comprehensions
Hi everyone,
This is a proposal for introducing local mutable variables to Elixir, inspired by the Koka programming language. This is another attempt of solving the comprehension problem, which we first discussed two years ago. This proposal is divided in four parts:
-
Problem statement
-
Comprehensions
-
Local mutable variables
-
Revisiting the problem
Problem statement
I have been on the record a couple times saying that, while some problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the “nested-data-structures-traversal” repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it.
Personally speaking, the most concise and clear solution is the Python one, which I reproduce here:
section_counter = 1
lesson_counter = 1
for section in sections:
if section["reset_lesson_position"]:
lesson_counter = 1
section["position"] = section_counter
section_counter += 1
for lesson in section["lessons"]:
lesson["position"] = lesson_counter
lesson_counter += 1
There are many things that make this solution clear:
- Reassignment
- Mutability
- Sensitive whitespace
Let’s compare it with the Elixir solution I wrote and personally prefer. I am pasting an image below which highlights certain aspects:
-
Lack of reassignment: in Elixir, we can’t reassign variables, we can only rebind them. The difference is, when you do
var = some_value
inside aif
,for
, etc, the value won’t “leak” to the outer scope. This implies two things in the snippet above:- We need to use
Enum.map_reduce/3
and pass the state in and out (highlighted in red) - When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow)
- We need to use
-
Lack of mutability: even though we set the lesson counter inside the inner
map_reduce
, we still need to update the lesson inside the session (highlighted in green) -
Lack of sensitive whitespace: we have two additional lines with
end
in them (highlighted in blue)
As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the reason why I brought it up is to make it clear they are not the source of verbosity, so we can confidentaly discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive.
However, there is still a gap to mind. So how can we move forward?
Comprehensions
Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartesian product between all points in x
and y
? You could write this:
Enum.flat_map(x, fn i ->
Enum.map(y, fn j -> {i, j} end)
end)
Or with a comprehension:
for i <- x, j <- y, do: {i, j}
Or maybe you want to brute force your way into finding Pythagorean Triples?
Enum.flat_map(1..20, fn a ->
Enum.flat_map(1..20, fn b ->
1..20
|> Enum.filter(fn c -> a*a + b*b == c*c end)
|> Enum.map(fn c -> {a, b, c} end)
end)
end)
Or with a comprehension:
for a <- 1..20,
b <- 1..20,
c <- 1..20,
a*a + b*b == c*c,
do: {a, b, c}
There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages).
However, a common point that arose during the discussion of how to solve this problem using comprehensions is that comprehensions are currently under utilized in Elixir. To address that, let’s take a deeper look into what comprehensions provide.
Generators
Let’s start with a simple problem. You have a list of numbers and you want to multiply each element in the list by two. We can do this:
iex> for i <- [1, 2, 3] do
...> i * 2
...> end
[2, 4, 6]
The part i <- [1, 2, 3]
is a generator. It gets each value in the list [1, 2, 3]
and binds them to the variable i
one at a time. Once i
is bound, it executes the contents of the do-end
block. The new list is formed by the results of the do-end
block.
A comprehension can have multiple generators too. One use of multiple generators is to find all possible combinations between two lists. Imagine for example you are interested in a new car. You have identifier three colors that you like: green, blue, and yellow. You are also divided between three brands: Ford, Volkswagen, and Toyota. What are all combinations available?
Let’s first define variables:
iex> colors = [:green, :blue, :yellow]
iex> cars = [:ford, :volkswagen, :toyota]
Now let’s find the combinations:
iex> for color <- colors, car <- cars do
...> "#{color} #{car}"
...> end
["green ford", "green volkswagen", "green toyota", "blue ford",
"blue volkswagen", "blue toyota", "yellow ford", "yellow volkswagen",
"yellow toyota"]
By having two generators, we were able to combine all options into strings.
Multiple generators are also useful to extract all possible values that are nested within other colors. Imagine that you have a list of users and their favorite programming languages:
iex> users = [
...> %{
...> name: "John",
...> languages: ["JavaScript", "Elixir"]
...> },
...> %{
...> name: "Mary",
...> languages: ["Erlang", "Haskell", "Elixir"]
...> }
...> ]
If we want to get all languages from all users, we could use two generators. One to traverse all users and another to traverse all languages:
iex> for user <- users, language <- user.languages do
...> language
...> end
["JavaScript", "Elixir", "Erlang", "Haskell", "Elixir"]
The comprehension worked as if it retrieved the languages lists of all users and flattened it into a list, with no nesting.
The important concept about for-comprehensions so far is that we never use them to mutate values. Instead, we explicitly use them to explicitly map inputs to outputs: the lists that we want to traverse are given as inputs and for
returns a new list as output, based on the values returned by the do-end
block.
The :uniq
option
In the example above, you may be wondering: what if we want all languages from all users but with no duplicates? You are in lucky, comprehensions also accept options, one of them being :uniq
:
iex> for user <- users, language <- user.languages, uniq: true do
...> language
...> end
["JavaScript", "Elixir", "Erlang", "Haskell"]
Comprehension options are always given as the last argument of for
, just before the do
keyword.
Filters
So far we used comprehensions to map inputs to outputs, to generate combinations, or to flatten lists nested inside other lists. We can also use comprehensions to filter the input, keeping only the entries that match a certain condition. For example, imagine we have a list of positive and negative numbers, and we want to keep only the positive ones and then multiply them by two:
iex> for i <- [-5, -3, -2, 1, 2, 4, 8], i > 0 do
...> i * 2
...> end
[2, 4, 8, 16]
Filters are given as part of the comprehension arguments. If the filter returns a truthy value (anything except false
and nil
), the comprehension continues. Otherwise it skips to the next value.
You can give as many filters as you want, including mixed with other generators. Let’s go back to our users example and add some arbitrary rules. Imagine that we only want to consider programming languages from users that have the letter “a” in their name:
iex> for user <- users, String.contains?(user.name, "a"), language <- user.languages do
...> language
...> end
["Erlang", "Haskell", "Elixir"]
As you can see, due to the filter, we skipped John’s languages.
What if we want only the programming languages that start with the letter “E”?
iex> for user <- users, language <- user.languages, String.starts_with?(language, "E") do
...> language
...> end
["Elixir", "Erlang", "Elixir"]
Now we got languages from both, including the duplicates, but returned only the ones starting with “E”. You can still use the :uniq
option, give it a try!
Where it falls apart…
Comprehensions start falling apart when you need to return additional values.
Let’s go back to our initial example. Imagine that you want to traverse a list of numbers, multiple each element in it by two while returning the sum of the original list at the same time.
In most non-functional programming languages, you might achieve this task like this:
sum = 0
list = []
for(element of [1, 2, 3]) {
list.append(element * 2)
sum += element
}
list /* [2, 4, 6] */
sum /* 6 */
This is quite different from how we have been doing things so far. In the example above, the for
loop is changing the values of list
and sum
directly, which is then reflected in those variables once the loop is over.
Unfortunately, there is no mechanism to write this using for-comprehensions in Elixir. We need to fallback to map_reduce
or reduce
algorithms, going back to the problems described at the top of this post.
Local mutable variables
Koka is a very interesting programming language that introduces mutability in places, either for performance or clarity, without sacrificing functional principles. The idea is that all mutability should be local to a function, so for all users of said functions it still behaves like immutable functional code.
With mutable functions, we could solve the problem above like this:
mut sum = 0
list =
for element <- [1, 2, 3] do
sum = element + sum
element * 2
end
list #=> [2, 4, 6]
sum #=> 6
Local mutable variables can only be mutated within their scope. You can mutate them inside if
/unless
, for
comprehensions, cond
, receive
… but that’s all. So you could write code like this, if you want to:
mut list = []
if some_value? do
list = ["prefix" | list]
end
But you cannot write code like this, since anonymous functions escape the local function:
mut sum = 0
Enum.map([1, 2, 3], fn x ->
sum = x + sum
end)
In fact, the code above would fail to compile. Once we enter an anonymous function, any mutable variable becomes tainted, and attempting to reassign it leads to crashes. Passing a mutable variable to another function will only pass its current value and it is no longer mutable.
In other words, the mutation is fully local and it never escapes the current function, so the mutability can never be observed outside of the function. Furthermore, Elixir will take care of compiling mutable variables to purely functional code, and not even the implementation itself will rely on mutability.
Revisiting the problem
With all of this said, how we can solve the initial problem with local mutable variables:
mut section_counter = 0
mut lesson_counter = 0
for section <- sections do
if section["reset_lesson_position"] do
lesson_counter = 0
end
section_counter = section_counter + 1
lessons =
for lesson <- section["lessons"] do
lesson_counter = lesson_counter + 1
Map.put(lesson, "position", lesson_counter)
end
section
|> Map.put("lessons", lessons)
|> Map.put("position", section_counter)
end
By adding reassignment to the language in the form of local mutable variables, we can considerably reduce the amount of noise. We do this while still preserving the immutability semantics of data-structures and of all function callers. The only modification I have done to the solution is to start counting from zero, so we can store the incremented values in sessions and lessons.
I hope this provides some food for thought on this long-running discussion. Personally speaking, I find this solution gives the guarantees we expect from a functional language while providing a safe and clear alternative for those coming from imperative backgrounds.