Introducing `for let` and `for reduce`

That’s a great argument against implicit initialization from outside variables. Explicitly needing to state both the variable name for the accumulator as well as the value or variable for the initial value should be better on that front.

6 Likes

Right so just to make sure I’m on the same page, you’re saying that it should be required to:

sum = 37
{items, sum} = for let sum = sum, ... do
  {i, sum}
end

Particularly the sum = sum bit to sort of emphasize the rebinding significance. I think I can get behind that actually.

I’m not really a fan of init. Initializing the value is only one thing that happens, the other is that it is updated to the returned value at each iteration, and init not only doesn’t really tell you that, but it has such an “intuitive” meaning that you’d sort of expect it to only function to set the initial value.

2 Likes

Exactly. Just like you cannot do for a do to shorten for a <- a do.

While I can see this is confusing, it is not different from anything else in Elixir. You can still also write this (and I suspect many new users do):

sum = 0

if true do
  sum = 1
end

sum

But everyone quickly learns that the value inside the block does not affect values outside the block.

One way to read this is that the first argument is the argument of for, the second is let. But, as per the discussion above, the only option that would make this clear is calling it map_reduce instead of let. :slight_smile:

3 Likes

So here’s a thought on the whole for and map_reduce thing. for already is a bit of a shape shifter. Specifically I guess I’m objecting to this interpretation of how it reads:

I don’t think this is actually right. If it were, for reduce x = 1 should be read map reduce which is exactly what it doesn’t do, it just reduces. for does a bunch of stuff already, it’s a collection iterator tool. It basically says “we’re going to iterate over some stuff” and then if you use x > 1 in there it also filters, and if you do reduce it reduces. If you use into it does a reduce-like thing into a collection.

Where this leaves me is that I think map_reduce has a good case. Sure, a very literal reading of for map_reduce feels a bit redundant but I think that hyper literal reading is simply wrong in the general case. The big upside of map_reduce is that it

  1. Doesn’t introduce a term that is used nowhere else (let, given, etc)
  2. Uses a term (and return value shape) that is very common and is also used by Enum.map_reduce and friends.

It’s not the prettiest or cleverest answer, but that’s OK. I hear your concerns about people not knowing what map_reduce is at the start, but they won’t know what let is either, and will have to figure it out from how the comprehension operates. When they’re done, they’ll have learned something they can take elsewhere in the language.

8 Likes

what about for with_init sum = 0, x <- [1,2,3] do ...? Or even better to my ear is

for x <- [1,2,3],
  with_init sum = 0 do 
    {x * 2, sum + x}
end
1 Like

I was trying to write a macro that translated this

iex> for let sum = 0, i <- [1, 2, 3] do
...>   sum = sum + i
...>   {i * 2, sum}
...> end
{[2, 4, 6],  6}

to this

iex> Enum.map_reduce([1, 2, 3], %{sum: 0}, fn i, %{sum: sum} = acc -> 
...>   {i * 2, %{acc | sum: sum + 1}}
...> end)
...>|> then(fn {map, %{sum: sum}} -> {map, sum} end)
{[2, 4, 6], 6}

that can be simplified to in this case

iex> Enum.map_reduce([1, 2, 3], 0, fn i, sum -> 
...>   {i * 2, sum + 1}
...> end)
{[2, 4, 6], 6}

and I realized that the Enum.map_reduce is actually not that bad if you have ever coded in Elixir

if Enum.map_reduce was imported somewhere, it boils down to

iex> map_reduce([1, 2, 3], 0, fn i, sum -> 
...>   {i * 2, sum + i}
...> end)
{[2, 4, 6], 6}

or with more than one variable

iex> map_reduce([1, 2, 3], {0, 0}, fn i, {sum, count} -> 
...>   {i * 2, {sum + i, count + 1}}
...> end)
{[2, 4, 6], {6, 3}}

I mean, there are times when Elixir can look verbose, but this solution to the nested data structure traversal problem is 13 lines long and In my opinion it’s not a clever but unreadable solution, on the contrary I think it’s pretty straightforward (and more idiomatic Elixir than the for-let one, but that’s just my opinion.)

{sections, _} = Enum.map_reduce(sections, {1, 1}, fn %{"lessons" => lessons} = section, {sc, lc} ->
    lc = if section["reset_lesson_position"], do: 1, else: lc

    lessons = lessons
      |> Enum.with_index(lc)
      |> Enum.map(fn {lesson, p} -> Map.put(lesson, "position", p) end)

    section = Map.merge(section, %{"lessons" => lessons, "position" => sc})

    {section, {sc + 1, lc + Enum.count(lessons)}}
  end)

sections

I stay positive about the introduction of a for-let and a for-reduce pattern and, despite my comments, it doesn’t really matter what name or form will be chosen, given the track record of the Elixir core team, I’m more than confident that the best solution will be the one adopted in the end.

On a final note: If the main concern was that for-let is more readable, reduces the clutter and can eventually drive adoption, my advice is to also take a look at the ergonomics of the most used functions (Enum and Map module for example) and think how improve on them.

IMO simply removing the need to always prefix the module name for some of the fundamental ones, Elixir would look 30% less noisy.

Look for example at this once Enum and Map have been imported


{sections, _} = map_reduce(sections, {1, 1}, fn %{"lessons" => lessons} = section, {sc, lc} ->
  lc = if section["reset_lesson_position"], do: 1, else: lc

  lessons = lessons
    |> with_index(lc)
    |> map(fn {lesson, p} -> put(lesson, "position", p) end)

  section = merge(section, %{"lessons" => lessons, "position" => sc})

  {section, {sc + 1, lc + count(lessons)}}
end)

3 Likes

Point 1: Business communication in English has long used the abbreviation “RE” which is short for “regarding” or “in reference to this other thing”. Email picked up on this pre-existing terminology and that’s why email replies will alter the subject to “RE: original subject”.

Point 2: Elixir has the convention of using bang (!) to draw attention to a function. It can raise an exception. I think of it as indicating “pay extra attention to what you’re doing here”.

Therefore, we could consider something like:

for re! sum = 0, i <- [1, 2, 3] do ...

So when we’re code golfing, we can yell “FORE!” as golfers do when they tee off.
:icon_biggrin: :icon_biggrin: :icon_biggrin:

I’ll see myself to the door now

7 Likes

I’ve been reimplementing the examples so I can have a better visualization of the proposal. I think the intention behind allowing the user to represent the “returning shape” of the comprehension instead of having to accumulate the values is interesting for all intents and purposes.

for i <- [1, 2, 3], reduce: {[], 0} do
	{list, sum} -> {[i * 2 | list], sum + i}
end

# over

for let sum = 0, i <- [1, 2, 3] do
  sum = sum + i
  {i * 2, sum}
end

And

element = ["a", "b", "c", "d", "e"]

for i <- element, reduce: {[], 0} do
  {list, count} when count < 3 -> {[i | list], count + 1}
  {list, count} -> {list, count}  
end

# over

for let count = 0, count < 5, x <- element do
  {x, count + 1}
end

I think everyone can agree with that, the examples in the proposal are objectively clear. However, I feel that at this point, the discussion is going in another direction (I t tried to expand on that in the previous post)… By changing the default behavior of for it almost feels like we are talking about another thing completely. I’d very much rather have this as a different approach, perhaps a secondary keyword to deal with this specific scenario, like:

yield sum = 0, i <- [1, 2, 3] do
  sum = sum + i
  {i * 2, sum}
end

yield count = 0, count < 5, x <- element do
  {x, count + 1}
end

PS.: Doing this little conceptual exercise quickly reminded me of js generators and C#'s yield, which works similarly (-ish).

It would be considered the same as comprehensions go, but being clear that the expected output of the block is defined by the shape you pass as the first argument. This avoids introducing different “modes” to for and it seems simpler to explain because although it still works as a standard comprehension, it has a different signature altogether.

I think that perhaps this leaves space for expansion for options like for already does with :uniq and :reduce:

yield count = 0, count < 5, x <- element, async: true do
  {x, count + 1}
end

I don’t know if introducing a new keyword is a dealbreaker to @josevalim, but it seems (to me at least) easier to reason about than introducing multiple modes in something that it’s already very much easy to understand by itself.

2 Likes

Thank you @thiagomajesk for the detailed follow up. Introducing a new word will most likely cause some breakage. How much it depends on the word. The issue is that, because we have two modes (map_reduce/reduce), at least two words would be required and that may be a bit too much.

One way to see for let and for reduce is as a mechanism to avoid introducing new keywords. Although I think for_let and for_reduce should be pretty safe to introduce, conflict-wise.

I tried quite hard to find counter examples to this proposal, but after writing them I realize that they aren’t big deals or they are non issues.

With regular for you can first read the generators and filters, and then read the body of the comprehension. What you get in the body happens after all of the generators and filters were applied:

for x <- xs,
    foo?(x),
    bar?(x) do
  # do something with x
end

With the introduction of let, I could argue that suddenly that’s no longer the case, the stuff we extract from the generators may change, or the filter turns into an exit condition, not a filter. This may lead to some confusion but then:

for let count = 0,
    x <- xs,
    foo?(x) and count <= 10 do
  # do something with x and count
end

And it’s not hard to see what’s happening. What is confusing though, is that it’s not clear if this will cause all xs to be discarded because the filter returns false and the xs will be fully traversed, or it will lazily collect values from xs and abort the comprehension when the filter returns false.

Most of the examples shown here either bind a single variable, or use pattern matching with tuples. I could argue that it gets nasty very quickly with very long tuples and that it’s horrible having to figure out the position of a value in a tuple and where it binds to, but then you can do this:

{result, state} =
  for let state = %{a: 1, b: :foo, c: []},
      x <- xs do
    # ...
    state = %{state | c: [x + 1 | state.c]}
      {x, state}
  end

baz = state.c # ...

So again, not an issue.

So yeah, the only issue I see with the proposal is the lack of consensus over the “qualifier” name. The fact that let would be a function with special meaning to for is kind of alien to the way the rest of the language works, but when I try to imagine myself using it I don’t find it too much of an issue.

In any case, we can achieve the same today with map_reduce as you mentioned. Looking at the picture showing boilerplate vs what we actually care about:


And then again at the proposed code:

section_counter = 1
lesson_counter = 1
{sections, _acc} =
  for let {section_counter, lesson_counter}, section <- sections do
    lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter
    
    {lessons, lesson_counter} =
      for let lesson_counter, lesson <- section["lessons"] do
        {Map.put(lesson, "position", lesson_counter), lesson_counter + 1}
      end
    
    section =
      section
      |> Map.put("lessons", lessons)
      |> Map.put("position", section_counter)

    {section, {section_counter + 1, lesson_counter}}
  end

In terms of boilerplate reduction, I’m not sure there’s a substantial win. However, that aside, the benefits this proposal offers are, if I understand correctly:

  1. Allow to exit early from the comprehension by using the let bindings in filters(so, sort of a reduce_while)
  2. Allow “accumulator” or “state” value and binding to be defined in the same expression

I think 1 is the most important, as it enables more powerful patterns. I’m not entirely sure we are making a significant improvement in removing boilerplate, though. Honestly, I think it still looks ugly compared to python. The original proposal in the mailing list did address this issue, but it was too magical.

So in summary: the new syntax doesn’t feel wrong to me, and I’m interested in the new patterns it enables, but I’m not convinced that it reduces boilerplate, thus I’m not sure it would improve the situation for newcomers save for the “exit early” case.

4 Likes

Hey @josevalim, no problem at all. I have to say that I’m kinda surprised, which kind of breakage are you talking about? I think that other people might also be curious now that you mentioned it. My first instinct would be to think that introducing a new keyword as a new feature would be easier than actually modifying the behavior of an existing one (but I don’t really know the elixir codebase enough to be sure).

Update: I was thinking about Enum.chunk_while/4 and Enum.group_by/3, where you have better control on how to emmit chunks and change the resulting values, and I was thinking if this could be applicable here. So, instead of creating two different “modes”, what about treating the emitted values in a different way?

If I had to implement this in a simple function I’d rather have a transformation bein applied to the values after each emission:

# only map values (accumulating)
fun1 = fn v1, v2, {list, sum} -> { [v1|list], v2 + sum} end

yield {list, sum}, i <- [1, 2, 3], init: {[], 0}, fun: fun1 do
  {i * 2, sum + 1}
end

# keep values as-is (could be the default implementation)
fun2 = fn _v1, _v2, {sum, count} -> { sum,  count} end

yield {sum, count}, i <- [1, 2, 3], init: {0, 0}, fun: fun2 do
  {sum + i, count + 1}
end

One idea that came to me yesterday was

for binding sum = 0, x <- [1, 2, 3] do
  {x * 2, sum + x}
end

It works on 3 levels:

  • you are binding the variables’ initial values, with full pattern matching support
  • you are “binding” these values between iterations of the comprehension
  • we can refer to variables declared as such as “for-bindings”, which is better than the very poor “imperative variables” I have been calling them in my head throughout these proposals
1 Like

I was thinking bind instead of binding because there is already binding/0/1, so:

for bind sum = 0, x <- [1, 2, 3] do
  {x * 2, sum + x}
end

:slight_smile:

2 Likes

Constructs like for are called special forms, because they have to be implemented by the compiler for optimizations and so on. We only have a few of them. The issue is that you can’t define/import a function with the same name as a special form and the variations would most likely have to be a special form.

I am not convinced binding or bind is any different than let? And only the first proposal was imperative but that has been thrown out of the window by now. :slight_smile:

4 Likes

I think the biggest mental blockers to me is that the accumulator is declared in the first term, and the iterator is in the second term, then your output is a tuple with the iteratee as the first term and the accumulator as the second. This feels a bit un-elixirey because in reduce the enumerable/enumerated are both first term and the accumulator is the second term.

Of slightly less concern, but still a bit of a stumbling point for me was that there isn’t some sort of a semantic expectation that a tuple is being restructured as part of the result. Maybe something like this might help:

for {n <- enum, let s = 0} do ...

Don’t know if I love that either.

5 Likes

It reduces a small amount of boilerplate in the original example but I agree it is not significant. My hope though is that it reduces the conceptual overhead of reaching to that solution in the first place.

However, I think there are examples where it reduces considerably. Today, as I worked, I tried to see examples where to apply it. Here are two.

This:

    Enum.flat_map_reduce(subs, sources, fn sub, sources ->
      sub_formatter = Path.join(sub, ".formatter.exs")

      if File.exists?(sub_formatter) do
        formatter_opts = eval_file_with_keyword_list(sub_formatter)

        {formatter_opts_and_subs, sources} =
          eval_deps_and_subdirectories(:in_memory, [sub], formatter_opts, sources)

        {[{sub, formatter_opts_and_subs}], sources}
      else
        {[], sources}
      end
    end)

Can be rewritten to this:

   for let(sources),
        sub <- subs,
        sub_formatter = Path.join(sub, ".formatter.exs"),
        File.exists?(sub_formatter) do
      formatter_opts = eval_file_with_keyword_list(sub_formatter)

      {formatter_opts_and_subs, sources} =
        eval_deps_and_subdirectories(:in_memory, [sub], formatter_opts, sources)

      {{sub, formatter_opts_and_subs}, sources}
    end

And this:

  def find_asset_info(notebook, hash) do
    Enum.find_value(notebook.sections, fn section ->
      Enum.find_value(section.cells, fn cell ->
        is_struct(cell, Cell.Elixir) &&
          Enum.find_value(cell.outputs, fn
            {:js_static, %{assets: %{hash: ^hash} = assets_info}, _data} -> assets_info
            {:js_dynamic, %{assets: %{hash: ^hash} = assets_info}, _pid} -> assets_info
            _ -> nil
          end)
      end)
    end)
  end

to this:

  for reduce(value = nil),
      value == nil,
      section <- notebook.sections,
      %Cell.Elixir{} = cell <- section.cells,
      output <- cell.outputs do
    case output do
      {:js_static, %{assets: %{hash: ^hash} = assets_info}, _data} -> assets_info
      {:js_dynamic, %{assets: %{hash: ^hash} = assets_info}, _pid} -> assets_info
      _ -> nil
    end
  end

As soon as you get any kind of nesting, the comprehension format really starts to stand out.

9 Likes

I’m fine with let personally – and I think I understand why it was chosen – but in reading through all of the posts thus far, I saw there was some concern regarding let and some other suggestions etc. In light of that, bind just seemed more Elixirish to me. :slight_smile:

1 Like

I like the abilities that this proposal brings, but let/reduce are a bit of a non-obvious pair to my eyes. map_reduce/reduce seem more obvious, but I believe we’re trying to avoid the introduction of those words. It would be nice to either use a single word for both reduce and map reduce or no word at all. If we were to use a single word, accum makes pretty good sense to me.

So how do we make it such that no (or one) word is necessary? I don’t think this has been mentioned yet and perhaps for good reason :slight_smile: … One option might be to let the shape of the return dictate whether it is effectively a reduce or a map reduce. In the case where an accumulator is included, a list must be returned. When the list includes two items it is a map reduce and with one item it is a reduce.

Effectively a map reduce

iex> for sum = 0, i <- [1, 2, 3] do
...>   sum = sum + i
...>   [i * 2, sum]
...> end
[[2, 4, 6], 6]

Effectively a reduce

iex> for sum = 0, i <- [1, 2, 3] do
...>   sum = sum + i
...>   [sum]
...> end
[6]
1 Like

What about set? And what about moving it into the KW list at the end?

for i <- [1, 2, 3], set: sum = 0 do
  {i * 2, sum + i}
end

That way if there is already a sum, it would be

sum = 0
for i <- [1, 2, 3], set: sum = sum do
  {i * 2, sum + i}
end

Also, as a separate issue, (while consistent since plenty of other things do this in the language) I feel like without any type system implicitly returning the iterator value combined with the sum joined in a tuple is not intuitive. It would be nice to see syntax specifically for this form so that the implementation of it being returned as a tuple is not a common bug new users trip on. Creating a macro something like using(i * 2, sum + i) for the for return value would be best for readability and writability.

2 Likes