Introducing `for let` and `for reduce`

josevalim · December 23, 2021, 7:35am

In a comprehension, the flattening happens by supporting multiple generators. I don’t think we need to introduce another way of flattening, especially because the way comprehensions currently flatten also allow you to filter easily. And this adds an even bigger departure from the regular for (as in the most common case we now need to wrap each of the collection’s element in a list).

The proposal also explains why it is beneficial to have the variables declared before the generators, as it gives us more power to express other constructs later on.

stevensonmt:

I agree that this makes the comprehension “read” oddly in English. What about this kind of qualifier syntax:
for i <- [1,2,3,4],
let {count, sum} = {0, 0} do 
    sum = sum + i
    count = count + 1
    {i, {sum, count}}
end

I strongly disagree with this version. If I were to read such code, I would expect count and sum to be reset on every new value of i.

To be honest, I would say “having to read perfectly in english” is a red herring. I understand the current syntax can be confusing for some (and there is criticism saying so), but optimizing for “english readability” is not what we should aim for to address it.

The proposal also explains why it is beneficial to declare the “let” variables early on, as that gives the ability to express more complex scenarios in the future.

josevalim · December 23, 2021, 7:51am

fceruti:

I don’t know the intricacies of Elixir lang, but I would try to make it understandable on a english language level (that’s why the Python version is so understandable in my opinion).

Uninformed & probably silly proposal:
for map_reduce i <- [1, 2, 3], acc: {count, sum} = {0, 0} do

The proposal explains why it is important for the variables to be declared at the beginning, instead of at the end, but I also want to discuss the point you bring about map_reduce.

In my mind, “for” already means “map”, because that’s what it does by default. So “for map_reduce” is, in a way, the equivalent to “map map_reduce”. Also, I am worried that “map_reduce” doesn’t mean much for someone who doesn’t know what it means (but perhaps that should not be a reason to not use it). Other words that we considered were using for reduce instead of for let and for reduce_only to mean only reducing.

My first thought was actually to call it for with, to mean I want to do a for with these variables, but that would be ultimately very confusing with the with special form. I also considered for using, to mean I want to do a for using these variables. However, given the way to introduce variables in many imperative and functional languages is via let, that’s where we landed at the end. We also considered for given.

acc is functional jargon and an abbreviation. We could call it accumulate but maybe that’s too long? We could consider synonyms to accumulate too, such as gather and collect:

iex> for accumulate(sum = 0), i <- [1, 2, 3] do
...>   sum = sum + i
...>   {i * 2, sum}
...> end

iex> for gather(sum = 0), i <- [1, 2, 3] do
...>   sum = sum + i
...>   {i * 2, sum}
...> end

Then the follow up question is: would for accumulate and for reduce be clear enough to say which one is a map_reduce and which one is a reduce?

Also see above in this reply for a discussion on the names considered so far.

EDIT: Oh, Haskell calls it mapAccum instead of mapFold/mapReduce…

LostKobrakai · December 23, 2021, 9:43am

I wrote up most of a post about the potential to just have :map_reduce besides the existing :reduce to handle the fact that the existing implementation doesn’t handle map reduce operations well. To me this seems like the most straight forward step.

However I discared it because I don’t think that captures all the concerns of this proposal. One part is the lack of map reduce, but I guess the other important part is figuring out a declarative syntax for reduction operations. The things I currently don’t feel work great in the proposal in regards to the latter concern is two fold.

One is that we assign initial values with =. for let count = 0, … still looks a lot like count would just be 0. Yes there’s a let in front, but to me that doesn’t feel like proper signaling. Added parenthesis add a bit more “look at me”, but also look more like a independant function/macro than a part of for. (Also the fact that we use text/a name seems to be reason for a lot of the current discussion, can we possibly not use a word?) It seems all existing places where variables are assigned, but have different semantics than = elixir uses <- or -> in some shape or form. <- just is already used to denote generators in for. Maybe -> works?

for {0, 0} -> {sum, count}, i <- [1, 2, 3] do
 …
end

To me assigning an initial value actually reminds me a lot of assigning a default. Something like this:

for {sum, count} \\ {0, 0}, i <- [1, 2, 3] do
 …
end

The other part is the fact that while the “input” side to the do/end block is now more declarative we still retain the not so pretty fact of returning tuples of values because we need multiple returns for a map_reduce. It works, but doesn’t feel great. Though I’m also not sure there’s good alternatives with the current syntax elixir can parse without limiting when/how the accumulator can be manipulated.

I hardly did digging into the already discarded proposals, so I hope I didn’t reiterate.

josevalim · December 23, 2021, 10:10am

I think this explains exactly why adding the parenthesis is growing on me. At the same time, I think not using a word will make it worse, because the lack of name won’t make what it does obvious and there will be no way to distinguish between let and reduce. At the same time, I am not worried about people thinking they could use let or reduce elsewhere, we can have good error message for such cases.

Regarding using other operators, I think \\ is not a good choice because what happens in this case:

sum = ...

for let(sum \\ 0), ...

One would argue that sum should not be reset to 0 (and therefore \\ has no use in the example above).

We could use for let({sum, count} <- {0, 0}) but I think making it look like a generator will be confusing. let({0, 0} -> {sum, count}) is not good either as the semantics of -> usually is: “match on those variables on the left side and then execute the right side”. It doesn’t really assign left to right.

LostKobrakai · December 23, 2021, 10:16am

josevalim:

Regarding using other operators, I think \\ is not a good choice because what happens in this case:
sum = ...

for let(sum \\ 0), ...
One would argue that sum should not be reset to 0 (and therefore \\ has no use in the example above).

To me implicitly using the values of variables declared outside the for for accumulator variables would be more confusing than helping. If someone wants to use the existing sum I’d expect them to write sum \\ sum, while sum \\ 0 in your example starts from 0. This would align with how for sum <- sum, do: … acts.

Yeah, I expected that response (hence the trailing ?). I can see the reasoning, but also given <- does work differently between with and for I don’t feel as strongly if for would have it’s own meaning for ->.

stefanchrobot · December 23, 2021, 10:22am

Same here.

To an extent this code:

for let sum = 0, i <- [1, 2, 3], do: ...

suggests that I can possibly do:

let sum = 0

elsewhere because defining a variable is quite similar:

sum = 0

On the other hand:

for let(sum = 0), i <- [1, 2, 3], do: ...

would mean that I need to use

let(sum = 0)

in my code, which personally gives me a feeling that it won’t work elsewhere. I admit that it’s not a very strong argument though.

josevalim · December 23, 2021, 10:39am

It aligns with for sum <- sum but it is completely misaligned with how sum \\ sum works, as that would never be possible in its usage today. Having a default is conceptually too different from setting an initial value.

josevalim · December 23, 2021, 10:44am

Oh, this discussion gave me another idea for a name, init:

for init(sum = 0), x <- [1, 2, 3] do
  {x * 2, sum + x}
end

The goal is to initialize variables to be used as state during the for. All variables initialized as part of the for must then be returned inside do-end block.

It may be slightly confusing in cases like this:

sum = 0

for init(sum), x <- [1, 2, 3] do
  {x * 2, sum + x}
end

But we can argue it is a shortcut for init(sum = sum).

I don’t want to overreact but this may be my favorite option so far.

lud · December 23, 2021, 10:49am

It’s nice but it still does not convey the fact that the value is updated from the return tuple and the new value is reinjected on later iterations very well.

I was also thinking about this:

  let sum = 0 in for x <- [1, 2, 3] do
    {x * 2, sum + x}
  end

But it is not that explicit either.

josevalim · December 23, 2021, 10:55am

Having let at the beginning is not an option. It has several downsides:

It requires adding let as a special form will will most likely break some code
It may give the impression let is a general construct while it is only specific to for-comprehensions
It doesn’t answer how to handle for reduce and adding both reduce and let as special forms is even more likely to break existing code

massimo · December 23, 2021, 10:59am

josevalim:

Oh, this discussion gave me another idea for a name, init:
for init(sum = 0), x <- [1, 2, 3] do
  {x * 2, sum + x}
end
The goal is to initialize variables to be used as state during the for. All variables initialized as part of the for must then be returned inside do-end block.

With that in mind, why not

for var ..., x <- y, do: ...

JavaScript had to introduce let because var was historically function scoped, they needed something else for block scoping, but could not be change var scoping without breaking old code
in Elixir everything is already block scoped.

josevalim · December 23, 2021, 11:03am

I don’t think var is any better than let, unfortunately. I would say it is worse, actually. let at least is used by both functional and imperative languages where in some of those it doesn’t have a notion of mutability attached to it. If we have something named var, I would expect it to go the fully mutable route (and that’s how some languages like Scala use it).

stefanchrobot · December 23, 2021, 11:06am

Instead of starting with a generator, our comprehension starts with a let variable = initial expression.

I think init works better as a “special type of generator”/“generator wrapper”:

for sum <- init(0), x <- [1, 2, 3] do
  {x * 2, sum + x}
end

This looks unsurprising to my eye.

josevalim · December 23, 2021, 11:09am

I think the reinjection is somewhat implied because of for. But you are right, there is nothing conveying the fact it is updated from the return tuple.

Honestly, the only option so far that conveys this fact is map_reduce:

for map_reduce(sum = 0), x <- [1, 2, 3] do
  {x * 2, sum + x}
end

But if we were to call it map_reduce, I would change the tutorial to first introduce reduce, and then introduce map_reduce as a way of getting for to return its value and reduce at the same time.

I still have reservations about calling map_reduce, I am worried it is not immediately clear to those who are initially looking at it, but given we simply cannot agree on a new word, it may be that the best option is to use the vocabulary we already have.

opsb · December 23, 2021, 11:27am

Given a for comprehension has the following pipeline

generation -> filter -> reduction

There are currently two variations

generation -> filter -> map
generation -> filter -> reduce

and this discussion seems to be largely about adding

generation -> filter -> map_reduce

IMHO the existing form describes this pipeline very well:

map

[2, 4, 6, 8, 10] == for i < [1, 2, 3, 4, 5], is_odd(i), do: i * 2

reduce

9 == for i <- [1, 2, 3, 4, 5], is_odd(i), reduce: 0, do
  acc -> i + acc
end

Adding support for map_reduce could follow this existing form with:

{[2, 6, 10], 9} == for i <- [1, 2, 3, 4, 5], is_odd(i), map_reduce: 0, do
  acc -> {i*2, i + acc}
end

The lesson example would then read as

{sections, _acc} =
  for section <- sections, map_reduce: {1, 1} do
    {section_counter, lesson_counter} ->
        lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter

        {lessons, lesson_counter} =
            for lesson <- section["lessons"], map_reduce: lesson_counter do
              lesson_counter -> 
                {Map.put(lesson, "position", lesson_counter), lesson_counter + 1}    
            end

        section =
            section
            |> Map.put("lessons", lessons)
            |> Map.put("position", section_counter)

        {section, {section_counter + 1, lesson_counter}}
  end)

Compared to the proposed form there’s little in it in terms of syntactic noise and to my eye makes it clearer how variables will be bound.

{sections, _acc} =
  for let {section_counter, lesson_counter} = {1, 1}, section <- sections do
    lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter
    
    {lessons, lesson_counter} =
      for let lesson_counter, lesson <- section["lessons"] do
        {Map.put(lesson, "position", lesson_counter), lesson_counter + 1}
      end
    
    section =
      section
      |> Map.put("lessons", lessons)
      |> Map.put("position", section_counter)

    {section, {section_counter + 1, lesson_counter}}
  end

josevalim · December 23, 2021, 11:31am

Yeah, from the perspective of return type calling it map_reduce is indeed clearer. Although the proposal argues in the last section (and I also linked in the replies above) to why the variables for map_reduce/reduce should be introduced before the generators.

stevensonmt · December 23, 2021, 11:46am

Oh, I’ve misunderstood the original proposal then. I thought count and sum were being reset. In that situation I would think let ... for ... would be the more obvious syntax but agree with the priority of not requiring new special syntax.

mgibowski · December 23, 2021, 11:54am

Overall a great proposal.

Initialization before the comprehension

I have doubts this should be supported.

I imagine those variables will be simple initial values (0, empty list, etc…), so probably their initialization code will not be that big.

The consequence of allowing this is that they are no longer declared and used only within the scope of for_let block.

This could be confusing to users, as somebody could try to use the let variable after the for_loop and could expect the value to be the same as the last one in the loop.

An example:

iex> sum = 0
iex> count = 0
iex> for let {sum, count}, i <- [1, 2, 3] do
...>   sum = sum + i
...>   count = count + 1
...>   {i * 2, {sum, count}}
...> end
{[2, 4, 6], {6, 3}}
iex> sum # What is the value?
0

So I think it may be better if those variables can not be used (including initialized) outside of this block at all.

Ordering

I doubt there is anything to be done about it, but I will share my impression from reading this block of code the first time.

iex> for let {sum = 0, count = 0}, i <- [1, 2, 3] do
...>   sum = sum + i
...>   count = count + 1
...>   {i * 2, {sum, count}}
...> end

Here let bindings are on the first position in the first line, but they are returned as the second element in the tuple.
So there is this inconsistency and I think for some users it may be not intuitive to figure out which element in the tuple should be first and which second.

Naming

I am still not convinced about the name let. On one hand, it is short and communicates what it does. On the other hand, it means different things in other programming languages. I imagine programmers coming to Elixir and being surprised they can declare a variable with let only inside for comprehensions. It also could communicate to some people it can be mutating (let vs const in JavaSript).

Maybe reusing directly the Haskell accum?
init is also interesting, especially if initialization was not allowed outside of the for block.

dimitarvp · December 23, 2021, 12:06pm

I like the init idea.

dimitarvp · December 23, 2021, 12:07pm

You have to assign the result of for so you can rebind the original variables or introduce new ones. You have a choice.