On iteration and enumeration

In the topic about Monads, at some point the discussion moved more towards Enumeration:

So I’ll move over the relevant posts, and then we can continue the discussion here :slight_smile: .

As for:

Enum does not (contrary to what intuition might suggest) implement the Functor (fun_land source code) pattern. Thus Enum.map is not a 'true map` because indeed it (and all functions that are based on it) always returns a list rather than ‘a container with transformed contents but with the same containing structure’.

Instead of on map, Enum is built on the concept of reduce Reducable(fun_land source code) or, in Haskell parlance, Foldable is a an interface that is implementable for a larger group of datatypes than Mappable, because things like sets and numerical ranges can be reduced but not mapped (since then it would be possible to end up with two values with the same value, which goes against the invariant of a set).

So why do you care about Functor-style ‘true’ map? Because Enum.map and similar functions throw away the structure of the container. If you have a (rose) tree, good luck on iterating it using Enum.map and in the end attempting to regain the same tree structure as before.

I have the feeling (please correct me if I’m wrong, @josevalim) that the current approach of Enum.map being built on top of reduce was to be able to use iteration functions with these other data types, and that this choice might’ve been influenced by Ruby, which takes the same approach.

1 Like

Those results show you are very likely benchmarking without consolidated protocols. Given how maps work, I would expect your implementation to be slower because the only possible way to efficiently traverse a large map and return a map is by doing conversion to a list, traversing and then going from a list to a map. So by definition your solution has to be slower.

This is not related to Ruby at all. I just said this in another thread but the things that are related mostly belong to the syntax.

You explained why we preferred foldable quite well:

Haskell parlance, Foldable is a an interface that is implementable for a larger group of datatypes than Mappable, because things like sets and numerical ranges can be reduced but not mapped

We use foldable because it allows us to support the widest variety of data types, while being efficient and also supporting resources and laziness. Remember that Enum supports resources, so a functor wouldn’t work for most streams either (io, file, stage, etc). You also wouldn’t be able to call a map and return anything that is not a tuple, etc.

EDIT: Introducing reducees « Plataformatec Blog - this article explains how the protocol works with the widest variety of data types as possible. And that was the goal, to have a single protocol instead of multiple ones. There is still space for a functor/traverse but given that even map operations require you to first convert to a list and then back, it does not add much beyond to what we have today.

2 Likes

Just want to make a quick remark on this:

:maps.map/2 is the Functor mapping function for maps. The reasoning is as follows: Since maps cannot contain duplicate keys, the structure would be changed if a mapping operation would also allow altering the keys. Thus, the return value of the function passed to map/2 is the value that is used for the current key at all times.


Related to this is the Extractable library I wrote a couple of months back, which allows you to enumerate collections in a one-item-per-time fashion, because that is of course one of the things that is impossible using the current Enumerable (You cannot stop in the middle using because of potential resource cleanup that has not yet happened while in :suspended-mode, as well as the rest of the collection possibly being in a non-standard format. Using :halt discards the tail of the collection however, so you cannot use that either.)

Why want one-at-a-time? Because not all operations lend themselves to the giving control to the collection paradigm.

Indeed, currently Erlang does not allow you to simply extract a single {K, V}-pair from a map (hopefully it will be added at some time in the future, there were discussions about it a while back IIRC), and therefore the map needs to be converted into a list and back again each time.


Thank you for sharing that blog post here! Very clear explanation of why Enum(erable) is the way it is today :heart_eyes:.

And maps:map2 converts to a list and then converts it back to a map, which goes back to my earlier point of it all being done on lists anyway: otp/lib/stdlib/src/maps.erl at master · erlang/otp · GitHub :slight_smile:

BIFs do not accept anonymous function as argument and I don’t believe there is a plan to lift this restriction. So it is probably staying like this for now.

Does it work with resources? Because it was one of the first approaches that we tried but very error prone when it comes to dangling resources.

It does not; that is, resource management is something that needs to be done manually. I’ll edit my earlier post to clarify this further.

I’d love to talk more about this – I also started working on a proof-of-concept iterator library called iter.

However, I think we are veering too far off topic now; since we’re not talking about Monads anymore. I’ll open a new topic :slight_smile: .

EDIT: Done! We’re in the new topic now.

I’m doing the same as that is the only erlang api for traversing a map to fold over it, just handling it all internally and hidden from the user.

But yes, they are consolidated. :slight_smile:

Here is the bench file (I could easily be doing something wrong, please tell!):

defmodule Helpers do
  def do_flat_map({k, v}), do: %{k => v, v => k}
  def do_flat_map(v), do: [v, v]

  # `wrong` only in that the test is not returning the right format of data
  def do_flat_map_wrong({k, v}), do: [{k, v}, {v, k}]
  def do_flat_map_wrong(v), do: [v, v]
end

inputs = %{
  "List"     => [:a, :b, :c, :d],
  "Map"      => %{a: 1, b: 2, c: 3, d: 4},
  "ValueMap" => ExCore.Monad.wrap(42, %{}), # Just testing, ignore this overall
}


actions = %{
  "Elixir.Enum.flat_map" => fn input -> Enum.flat_map(input, &Helpers.do_flat_map/1) end,
  "Elixir.Enum.flat_map_wrong" => fn input -> Enum.flat_map(input, &Helpers.do_flat_map_wrong/1) end,
  "ExCore.Monad.flat_map" => fn input -> ExCore.Monad.flat_map(input, &Helpers.do_flat_map/1) end,
}

IO.puts("Is `Enumerable` consolidated:  #{Protocol.consolidated?(Enumerable)}")
IO.puts("Is `Collectable` consolidated:  #{Protocol.consolidated?(Collectable)}")

Benchee.run actions, inputs: inputs, time: 5, warmup: 5, print: %{fast_warning: false}

And the results:

╰─➤  mix bench monad
Compiling 1 file (.ex)
Is `Enumerable` consolidated:  true
Is `Collectable` consolidated:  true
Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Number of Available Cores: 2
Available memory: 8.011072 GB
Elixir 1.6.0-dev
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 5.00 s
time: 5.00 s
parallel: 1
inputs: List, Map, ValueMap
Estimated total run time: 1.50 min



Benchmarking with input List:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...

Benchmarking with input Map:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...

Benchmarking with input ValueMap:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...

##### With input List #####
Name                                 ips        average  deviation         median
ExCore.Monad.flat_map             3.84 M        0.26 μs  ±1213.04%        0.20 μs
Elixir.Enum.flat_map              3.82 M        0.26 μs  ±1110.99%        0.20 μs
Elixir.Enum.flat_map_wrong        3.71 M        0.27 μs  ±1446.02%        0.20 μs

Comparison: 
ExCore.Monad.flat_map             3.84 M
Elixir.Enum.flat_map              3.82 M - 1.00x slower
Elixir.Enum.flat_map_wrong        3.71 M - 1.03x slower

##### With input Map #####
Name                                 ips        average  deviation         median
ExCore.Monad.flat_map             3.42 M        0.29 μs ±20998.62%         0.0 μs
Elixir.Enum.flat_map_wrong        1.33 M        0.75 μs   ±330.34%        0.70 μs
Elixir.Enum.flat_map              0.67 M        1.50 μs  ±1704.78%        1.00 μs

Comparison: 
ExCore.Monad.flat_map             3.42 M
Elixir.Enum.flat_map_wrong        1.33 M - 2.58x slower
Elixir.Enum.flat_map              0.67 M - 5.12x slower

##### With input ValueMap #####
Name                                 ips        average  deviation         median
Elixir.Enum.flat_map_wrong        4.96 M        0.20 μs ±37744.65%         0.0 μs
ExCore.Monad.flat_map             3.68 M        0.27 μs  ±1833.90%       0.120 μs
Elixir.Enum.flat_map              3.18 M        0.31 μs ±13198.19%         0.0 μs

Comparison: 
Elixir.Enum.flat_map_wrong        4.96 M
ExCore.Monad.flat_map             3.68 M - 1.35x slower
Elixir.Enum.flat_map              3.18 M - 1.56x slower

The reason mine is faster I’m pretty sure is because of the module call overhead and guards that elixir’s current style uses and exclusively because of that, not because of any algorithmic cost differences.

Here is the definition of the Monad type:

import ProtocolEx
defprotocol_ex ExCore.Monad, as: monad do
  require ExCore.StreamDataGenerator

  def wrap(value, monad)
  def flat_map(monad, fun) when is_function(fun, 1)

  deftest neutrality do
    generator = ExCore.StreamDataGenerator.generator(Any, [])
    ident_wrap = &wrap(&1, nil)
    StreamData.check_all(generator, [initial_seed: :os.timestamp()], fn v ->
      wrap(v, nil) === flat_map(wrap(v, nil), ident_wrap) && {:ok, v} || {:error, v}
    end)
  end

  deftest succession_ordering do
    generator = ExCore.StreamDataGenerator.generator(ExCore.Monad, __MODULE__, [])
    f = fn x -> wrap({x, 1}, nil) end
    g = fn {x, 1} -> wrap(x, nil) end
    StreamData.check_all(generator, [initial_seed: :os.timestamp()], fn v ->
      if flat_map(flat_map(wrap(v, nil), f), g) === flat_map(wrap(v, nil), fn x -> flat_map(f.(x), g) end) do
        {:ok, v}
      else
        {:error, v}
      end
    end)
  end
end

Here is my Monad’s implementation for lists and maps (and also included is the :ok/:error tuples as an example of those):

import ProtocolEx

## List

defimpl_ex EmptyList, [], for: ExCore.Monad do
  @priority 65535

  defmacro wrap(value, monad_type) do
    quote generated: true do
      _ = unquote(monad_type)
      [unquote(value)]
    end
  end

  defmacro flat_map(empty_list, fun) do
    quote generated: true do
      case unquote(empty_list) do
        [] -> []
        list -> ExCore.Monad.List.flat_map(list, unquote(fun)) # This case is purely for testing purposes
      end
    end
  end
end


defimpl_ex List, [_ | _], for: ExCore.Monad do
  @priority 65534

  defmacro wrap(value, monad_type) do
    quote generated: true do
      _ = unquote(monad_type)
      [unquote(value)]
    end
  end

  def flat_map([], _fun), do: []
  def flat_map([head | tail], fun), do: fun.(head) ++ flat_map(tail, fun)
end


defimpl_ex Map, %{}, for: ExCore.Monad do
  @priority -65535

  defmacro wrap(value, monad_type) do
    quote generated: true do
      case unquote(monad_type) do
        %{__struct__: _} -> # Not structs
          raise %ProtocolEx.UnimplementedProtocolEx{
            proto: ExCore.Monad,
            name: :wrap,
            arity: 2,
            value: [unquote(value), unquote(monad_type)],
          }
        _ ->
        case unquote(value) do
          {key, value} -> %{key => value}
          value -> %{__value__: value}
        end
      end
    end
  end

  def flat_map(%{__struct__: _} = map, fun) do
    raise %ProtocolEx.UnimplementedProtocolEx{proto: ExCore.Monad, name: :flat_map, arity: 2, value: [map, fun]}
  end
  def flat_map(%{__value__: value} = map, fun) when map_size(map) === 1 do
    fun.(value)
  end
  def flat_map(map, fun) do
    map
    |> :maps.to_list()
    |> flat_map(fun, %{})
  end

  defp flat_map([], _fun, returned), do: returned
  defp flat_map([element | rest], fun, returned) do
    flat_map(rest, fun, :maps.merge(returned, fun.(element)))
  end
end


defimpl_ex OkValue, {:ok, _}, for: ExCore.Monad do
  @priority 65532

  defmacro wrap(value, monad_type) do
    quote do
      _ = unquote(monad_type)
      {:ok, unquote(value)}
    end
  end

  defmacro flat_map(ok_value, fun) do
    quote do
      unquote(fun).(elem(unquote(ok_value), 1))
    end
  end
end

defimpl_ex Ok, :ok, for: ExCore.Monad do
  @priority 65533

  defmacro wrap(value, monad_type) do
    quote do
      _ = unquote(monad_type)
      {:ok, unquote(value)}
    end
  end

  defmacro flat_map(ok_value, fun) do
    quote generated: true do
      fun = unquote(fun)
      case unquote(ok_value) do
        {:ok, value} -> fun.(value)
        :ok -> fun.(nil)
      end
    end
  end
end

defimpl_ex ErrorValue, {:error, _}, for: ExCore.Monad do
  @priority 65532

  defmacro wrap(value, monad_type) do
    quote do
      _ = unquote(monad_type)
      {:error, unquote(value)}
    end
  end

  defmacro flat_map(error_value, fun) do
    quote do
      _ = unquote(fun)
      unquote(error_value)
    end
  end
end

defimpl_ex Error, :error, for: ExCore.Monad do
  @priority 65533

  defmacro wrap(value, monad_type) do
    quote do
      _ = unquote(monad_type)
      {:error, unquote(value)}
    end
  end

  defmacro flat_map(error_value, fun) do
    quote do
      _ = unquote(fun)
      unquote(error_value)
    end
  end
end

And I can show the generated output code too if curious?

True true!

I actually quite like how C++ does it. Not really proper categorical setup but highly useful while being descriptive about ‘how’ it can be used too. :slight_smile:

Yep, that is definitely want you want for a Foldable! :slight_smile:

Yeah that is hard to do perfectly efficiently the way maps are encoded currently.

Indeed! Folding is definitely not flat_mapping, hence why they are different categories. ^.^
That is why fold is one of the root-bases in the Categorical Type tree (Monad technically inherits from it and others optionally to delegate to fold and flatten to implement flat_map automatically). :slight_smile:

Hmm, yeah I’ve never seen that done, it makes sense though that it would not call back into the runtime as BIF’s are supposed to be *fast*fast*fast*!

Another drawback of reducers is, because the collection is the one controlling the reducing, we cannot implement operations like zip that requires taking one item from a collection, then suspending the reduction, then taking an item from another collection, suspending it, and starting again by resuming the first one and so on.
Again, at least not in a purely functional way.

Actually Categorical Types has this covered too, there is a basic 1-type reducer, and there is a basic 2-type reducer, up to N-type reducers. :slight_smile:

Most systems only implement up to 2 because with 2 you can reduce over 2 at a time to reduce to a list of lists until you get a final list of all lists to pass to the final reducing function as a list of an element at a time of each sub list.

Ah cool. :slight_smile: