Why Enum.reduce fun arguments are x, acc instead of acc, x?

stefanluptak · June 10, 2019, 10:18am

Good day everyone!

I am using Enum.reduce/3 (Enum — Elixir v1.9.0-rc.0) function quite often and I am regularly wondering why are the arguments for reducing function x, acc instead of acc, x.

Almost all functions in Elixir are in the form of Module.function(subject_of_change, change_argument). For example MapSet.put/2 or List.delete/2. Thanks to this, we can write code like:

changeset
|> cast(params, [ ... ])
|> validate_required([ ... ])
|> unique_constraint( ... )

But then, when I want to do something like this:

Enum.reduce([1,2,3,4], MapSet.new, &MapSet.put(&2, &1))

I need to use the &2, &1 form which is a bit unhandy. I think this form would be a lot nicer:

Enum.reduce([1,2,3,4], MapSet.new, &MapSet.put/2)

Is there some explanation for this? I am really curious.

Thanks a lot.

michalmuskala · June 10, 2019, 10:39am

I always thought that it follows the order of the arguments to Enum.reduce itself - first the collection, then the accumulator, so in the reducer you get the element first and the accumulator second.

dimitarvp · June 10, 2019, 11:46am

Probably to give you a chance to utilise tail-call recursion in the passed closure function, which only works if the accumulator is the last parameter (unless I am horribly misremembering this).

stefanluptak · June 10, 2019, 12:43pm

Interesting. This would make sense. Thanks for the info.

alco · June 10, 2019, 12:58pm

That is incorrect.

A simple implementation of reduce() looks somewhat like this:

def reduce([], acc, _fun) do
  acc 
end

def reduce([h|t], acc, fun) do
  reduce(t, fun.(h, acc), fun)
end

Tail recursion is happening for invocations of reduce() itself, the details of the function fun don’t matter.

peerreynders · June 10, 2019, 1:08pm

I think you’re remembering that foldl is easier to tco than foldr.

defmodule Fold do
  # reduce _is_ foldl
  # foldl [a] -> b -> (a -> b -> b) -> b
  def foldl([], acc, _),
    do: acc

  def foldl([x | xs], acc, f),
    do: foldl(xs, f.(x, acc), f)

  # foldr [a] -> b -> (a -> b -> b) -> b
  def foldr([], acc, _),
    do: acc

  def foldr([x | xs], acc, f),
    do: f.(x, foldr(xs, acc, f))
end

list = [1, 2, 3]
f = &[&1 | &2]

IO.inspect(Fold.foldl(list, [], f)) # [3,2,1]
IO.inspect(Fold.foldr(list, [], f)) # [1,2,3]

Folding (reducing) emerged from “folding lists” - the most common list operation is cons-ing:

f = &[&1 | &2]

Note how the parameter order isn’t “unhandy”. One could argue that the unhandy-ness is a result of the preferred parameter order in Elixir due to pipelining. In a curried language the opposite order (the thing that changes should be last) would be preferred.

Note also that Elixir inherited the order from Erlang (1986):

http://erlang.org/doc/man/lists.html#foldl-3

and Erlang has no pipelining.

FYI:

Haskell (1990):

Prelude.foldl: (a -> b -> a) -> a -> [b] -> a
Prelude.foldr: (a -> b -> b) -> b -> [a] -> b

OCaml (1996):

fold_left : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a
fold_right : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b

Interestingly for “fold left” it’s (acc, elem -> acc) while for “fold right” it’s (elem, acc -> acc)

Hypothesis:

(acc, elem -> acc) for “fold left” accumulator is built from the list left-to-right (left associative)
(elem, acc -> acc) for “fold right” accumulator is built from the list right-to-left (right associative)

LISP (1958)

(reduce (lambda (x y) (+ (* x 10) y)) '(1 2 3 4)) => 1234

iex(1)> Enum.reduce([1,2,3,4], & &2 * 10 + &1)
1234

Again LISP used (acc, elem -> acc) for reduce/fold left.

So why would Erlang use (elem, acc -> acc)?

Hypothesis:

Conceptually general “folding” refers to foldr because it preserves the order within a list - hence (elem, acc -> acc).

defmodule X do
  def map_1(list, f),
    do: :lists.foldr(&[f.(&1) | &2], [], list)

  def map_2(list, f) do
    (&[f.(&1) | &2])
    |> :lists.foldl([], list)
    |> :lists.reverse()
  end

  def scan_1(list, init, f) do
    fun = fn x, g ->
      fn y ->
        value = f.(x, y)
        [value | g.(value)]
      end
    end

    acc = fn _ -> [] end
    lazy = :lists.foldr(fun, acc, list)
    lazy.(init)
  end

  def scan_2(list, init, f) do
    fn
      x, [y | _] = acc ->
        [f.(x, y) | acc]

      x, [] ->
        [f.(x, init)]
    end
    |> :lists.foldl([], list)
    |> :lists.reverse()
  end
end

list = [1, 2, 3]
f = &(&1 * 2)
g = &Kernel.+/2

IO.inspect(X.map_1(list, f))     # [2, 4, 6]
IO.inspect(X.map_2(list, f))     # [2, 4, 6]
IO.inspect(X.scan_1(list, 2, g)) # [3, 5, 8]
IO.inspect(X.scan_2(list, 2, g)) # [3, 5, 8]

From a performance standpoint foldl is preferred for long lists but requires reversal if order is relevant. But for pragmatic reasons sticking to the same function type (elem, acc -> acc) makes a function usable for both foldr and foldl.

dimitarvp · June 10, 2019, 4:54pm

Yeah, that was it and thanks for correction.

sasajuric · June 10, 2019, 6:03pm

Don’t know the answer, but FWIW I also think that passing the acc first, collection element second would be more intuitive, and I frequently find myself flipping these two args in reduce.

keathley · June 10, 2019, 6:42pm

Yup. Generally having the arguments in the current order is less ergonomic. But I suppose I’ll have to wait for elixir 2.0 before it’s made consistent.

sasajuric · June 10, 2019, 6:53pm

My guess is that this change will never happen, but who knows

peerreynders · June 10, 2019, 6:58pm

Many other languages agree with you (though initially I found the swap between foldl (i.e. acc, elem -> acc) and foldr (i.e. elem, acc -> acc) a bit confusing).

for comprehensions side step the issue:

iex(1)> for x <- [1,2,3,4], reduce: %MapSet{} do
...(1)>   acc -> MapSet.put(acc,x)
...(1)> end
#MapSet<[1, 2, 3, 4]>
iex(2)>

github.com

erlang/otp/blob/master/lib/stdlib/src/lists.erl#L1253-L1277


      
          -spec usort(Fun, List1) -> List2 when
                Fun :: fun((T, T) -> boolean()),
                List1 :: [T],
                List2 :: [T],
                T :: term().
          
          usort(Fun, [_] = L) when is_function(Fun, 2) ->
              L;
          usort(Fun, [] = L) when is_function(Fun, 2) ->
              L;
          usort(Fun, [X | L]) when is_function(Fun, 2) ->
              usort_1(Fun, X, L).
          
          usort_1(Fun, X, [Y | L]) ->
              case Fun(X, Y) of
                  true ->
                      case Fun(Y, X) of
                          true -> % X equal to Y
                              case L of
                                  [] ->

This file has been truncated. show original

foldl(F, Accu, [Hd|Tail]) ->
    foldl(F, F(Hd, Accu), Tail);

% ...

foldr(F, Accu, [Hd|Tail]) ->
    F(Hd, foldr(F, Accu, Tail));

adkron · June 10, 2019, 7:05pm

The thing that gets me is that the first argument is the argument that is overwritten in memory as the return value. Since the return value would be placed in memory location one, which is the same as the first argument I would assume that you would get a small performance difference in favor of swapping the order from the current implementation.

We would only get the benefit of the argument pattern if we changed Erlang since Enum.reduce/3 delegates straight to :list.foldl/3 id the first argument is a list.

I haven’t tested this so I’m speculating. It is what I do.

LukeWood · June 10, 2019, 9:08pm

This is a really good point.

I frequently switch these as well.

wende · June 13, 2019, 2:45pm

In my opinion that’s because Elixir has the convention of having “target as the first argument”. This is mostly because of how pipes work. The direct target of the function is almost always the first argument. I would say that the element is the target of the reduction function and not the accumulator.
Other languages like haskell have that around, but they use last parameter as a target for the exact same reasons - pipes pipe to the last argument because it’s implemented on currying and not on macros.

stefanluptak · June 14, 2019, 8:17am

I agree with the first part, that usually the first argument is the one being modified and most of the time also returned from the function.
But in case of a function passed to Enum.reduce, I would say, that the accumulator is the one being modified based on the element currently iterated. You’re even returning modified acc from the reducing function. Having it in the form of acc, x should be the preferred form in my opinion.
I understand that this can be viewed from different angles and it’s not easy to decide which form is better or even change it in the next major Elixir release. I was just curious what was the reason it is the way it is.
Anyway, thanks a lot for all the responses. For me, it’s very interesting and informative to read them.

Aetherus · November 7, 2022, 5:47am

Sorry for digging up such an old thread. Recently I found a new reason. For non-commutative operations like matrix multiplication, the code would be shorter if we put the acc first. For example, given a list of matrices, to calculate the dot product of them, we have to write something like

prod = Enum.reduce(matrices, fn matrix, acc -> Matrix.dot(acc, matrix) end)

or

prod = Enum.reduce(matrices, &Matrix.dot(&2, &1))

But if acc is the first argument, then we can simply write

prod = Enum.reduce(matrices, &Matrix.dot/2)

Eiji · November 7, 2022, 6:06am

I was also switching said arguments, but honestly that was never a pain as it’s not enough big % of all reduce calls. Also I think that element is passed before accumulator for consistency i.e. not all Enum functions use accumulator and the only function argument everywhere is element. I think that for newbies it would be confusing why element is one time a first argument and the other time the second argument.