In the topic about Monads, at some point the discussion moved more towards Enumeration:
So I’ll move over the relevant posts, and then we can continue the discussion here .
In the topic about Monads, at some point the discussion moved more towards Enumeration:
So I’ll move over the relevant posts, and then we can continue the discussion here .
As for:
Enum
does not (contrary to what intuition might suggest) implement the Functor (fun_land
source code) pattern. Thus Enum.map
is not a 'true map` because indeed it (and all functions that are based on it) always returns a list rather than ‘a container with transformed contents but with the same containing structure’.
Instead of on map
, Enum
is built on the concept of reduce
Reducable(fun_land
source code) or, in Haskell parlance, Foldable is a an interface that is implementable for a larger group of datatypes than Mappable, because things like sets and numerical ranges can be reduced but not mapped (since then it would be possible to end up with two values with the same value, which goes against the invariant of a set).
So why do you care about Functor-style ‘true’ map
? Because Enum.map
and similar functions throw away the structure of the container. If you have a (rose) tree, good luck on iterating it using Enum.map
and in the end attempting to regain the same tree structure as before.
I have the feeling (please correct me if I’m wrong, @josevalim) that the current approach of Enum.map
being built on top of reduce
was to be able to use iteration functions with these other data types, and that this choice might’ve been influenced by Ruby, which takes the same approach.
Those results show you are very likely benchmarking without consolidated protocols. Given how maps work, I would expect your implementation to be slower because the only possible way to efficiently traverse a large map and return a map is by doing conversion to a list, traversing and then going from a list to a map. So by definition your solution has to be slower.
This is not related to Ruby at all. I just said this in another thread but the things that are related mostly belong to the syntax.
You explained why we preferred foldable quite well:
Haskell parlance, Foldable is a an interface that is implementable for a larger group of datatypes than Mappable, because things like sets and numerical ranges can be reduced but not mapped
We use foldable because it allows us to support the widest variety of data types, while being efficient and also supporting resources and laziness. Remember that Enum supports resources, so a functor wouldn’t work for most streams either (io, file, stage, etc). You also wouldn’t be able to call a map and return anything that is not a tuple, etc.
EDIT: Introducing reducees « Plataformatec Blog - this article explains how the protocol works with the widest variety of data types as possible. And that was the goal, to have a single protocol instead of multiple ones. There is still space for a functor/traverse but given that even map operations require you to first convert to a list and then back, it does not add much beyond to what we have today.
Just want to make a quick remark on this:
:maps.map/2
is the Functor mapping function for maps. The reasoning is as follows: Since maps cannot contain duplicate keys, the structure would be changed if a mapping operation would also allow altering the keys. Thus, the return value of the function passed to map/2 is the value that is used for the current key at all times.
Related to this is the Extractable library I wrote a couple of months back, which allows you to enumerate collections in a one-item-per-time fashion, because that is of course one of the things that is impossible using the current Enumerable (You cannot stop in the middle using because of potential resource cleanup that has not yet happened while in :suspended
-mode, as well as the rest of the collection possibly being in a non-standard format. Using :halt
discards the tail of the collection however, so you cannot use that either.)
Why want one-at-a-time? Because not all operations lend themselves to the giving control to the collection paradigm.
Indeed, currently Erlang does not allow you to simply extract a single {K, V}
-pair from a map (hopefully it will be added at some time in the future, there were discussions about it a while back IIRC), and therefore the map needs to be converted into a list and back again each time.
Thank you for sharing that blog post here! Very clear explanation of why Enum(erable)
is the way it is today .
And maps:map2 converts to a list and then converts it back to a map, which goes back to my earlier point of it all being done on lists anyway: otp/lib/stdlib/src/maps.erl at master · erlang/otp · GitHub
BIFs do not accept anonymous function as argument and I don’t believe there is a plan to lift this restriction. So it is probably staying like this for now.
Does it work with resources? Because it was one of the first approaches that we tried but very error prone when it comes to dangling resources.
It does not; that is, resource management is something that needs to be done manually. I’ll edit my earlier post to clarify this further.
I’d love to talk more about this – I also started working on a proof-of-concept iterator library called iter.
However, I think we are veering too far off topic now; since we’re not talking about Monads anymore. I’ll open a new topic .
EDIT: Done! We’re in the new topic now.
I’m doing the same as that is the only erlang api for traversing a map to fold over it, just handling it all internally and hidden from the user.
But yes, they are consolidated.
Here is the bench file (I could easily be doing something wrong, please tell!):
defmodule Helpers do
def do_flat_map({k, v}), do: %{k => v, v => k}
def do_flat_map(v), do: [v, v]
# `wrong` only in that the test is not returning the right format of data
def do_flat_map_wrong({k, v}), do: [{k, v}, {v, k}]
def do_flat_map_wrong(v), do: [v, v]
end
inputs = %{
"List" => [:a, :b, :c, :d],
"Map" => %{a: 1, b: 2, c: 3, d: 4},
"ValueMap" => ExCore.Monad.wrap(42, %{}), # Just testing, ignore this overall
}
actions = %{
"Elixir.Enum.flat_map" => fn input -> Enum.flat_map(input, &Helpers.do_flat_map/1) end,
"Elixir.Enum.flat_map_wrong" => fn input -> Enum.flat_map(input, &Helpers.do_flat_map_wrong/1) end,
"ExCore.Monad.flat_map" => fn input -> ExCore.Monad.flat_map(input, &Helpers.do_flat_map/1) end,
}
IO.puts("Is `Enumerable` consolidated: #{Protocol.consolidated?(Enumerable)}")
IO.puts("Is `Collectable` consolidated: #{Protocol.consolidated?(Collectable)}")
Benchee.run actions, inputs: inputs, time: 5, warmup: 5, print: %{fast_warning: false}
And the results:
╰─➤ mix bench monad
Compiling 1 file (.ex)
Is `Enumerable` consolidated: true
Is `Collectable` consolidated: true
Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Number of Available Cores: 2
Available memory: 8.011072 GB
Elixir 1.6.0-dev
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 5.00 s
time: 5.00 s
parallel: 1
inputs: List, Map, ValueMap
Estimated total run time: 1.50 min
Benchmarking with input List:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...
Benchmarking with input Map:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...
Benchmarking with input ValueMap:
Benchmarking Elixir.Enum.flat_map...
Benchmarking Elixir.Enum.flat_map_wrong...
Benchmarking ExCore.Monad.flat_map...
##### With input List #####
Name ips average deviation median
ExCore.Monad.flat_map 3.84 M 0.26 μs ±1213.04% 0.20 μs
Elixir.Enum.flat_map 3.82 M 0.26 μs ±1110.99% 0.20 μs
Elixir.Enum.flat_map_wrong 3.71 M 0.27 μs ±1446.02% 0.20 μs
Comparison:
ExCore.Monad.flat_map 3.84 M
Elixir.Enum.flat_map 3.82 M - 1.00x slower
Elixir.Enum.flat_map_wrong 3.71 M - 1.03x slower
##### With input Map #####
Name ips average deviation median
ExCore.Monad.flat_map 3.42 M 0.29 μs ±20998.62% 0.0 μs
Elixir.Enum.flat_map_wrong 1.33 M 0.75 μs ±330.34% 0.70 μs
Elixir.Enum.flat_map 0.67 M 1.50 μs ±1704.78% 1.00 μs
Comparison:
ExCore.Monad.flat_map 3.42 M
Elixir.Enum.flat_map_wrong 1.33 M - 2.58x slower
Elixir.Enum.flat_map 0.67 M - 5.12x slower
##### With input ValueMap #####
Name ips average deviation median
Elixir.Enum.flat_map_wrong 4.96 M 0.20 μs ±37744.65% 0.0 μs
ExCore.Monad.flat_map 3.68 M 0.27 μs ±1833.90% 0.120 μs
Elixir.Enum.flat_map 3.18 M 0.31 μs ±13198.19% 0.0 μs
Comparison:
Elixir.Enum.flat_map_wrong 4.96 M
ExCore.Monad.flat_map 3.68 M - 1.35x slower
Elixir.Enum.flat_map 3.18 M - 1.56x slower
The reason mine is faster I’m pretty sure is because of the module call overhead and guards that elixir’s current style uses and exclusively because of that, not because of any algorithmic cost differences.
Here is the definition of the Monad type:
import ProtocolEx
defprotocol_ex ExCore.Monad, as: monad do
require ExCore.StreamDataGenerator
def wrap(value, monad)
def flat_map(monad, fun) when is_function(fun, 1)
deftest neutrality do
generator = ExCore.StreamDataGenerator.generator(Any, [])
ident_wrap = &wrap(&1, nil)
StreamData.check_all(generator, [initial_seed: :os.timestamp()], fn v ->
wrap(v, nil) === flat_map(wrap(v, nil), ident_wrap) && {:ok, v} || {:error, v}
end)
end
deftest succession_ordering do
generator = ExCore.StreamDataGenerator.generator(ExCore.Monad, __MODULE__, [])
f = fn x -> wrap({x, 1}, nil) end
g = fn {x, 1} -> wrap(x, nil) end
StreamData.check_all(generator, [initial_seed: :os.timestamp()], fn v ->
if flat_map(flat_map(wrap(v, nil), f), g) === flat_map(wrap(v, nil), fn x -> flat_map(f.(x), g) end) do
{:ok, v}
else
{:error, v}
end
end)
end
end
Here is my Monad
’s implementation for lists and maps (and also included is the :ok/:error tuples as an example of those):
import ProtocolEx
## List
defimpl_ex EmptyList, [], for: ExCore.Monad do
@priority 65535
defmacro wrap(value, monad_type) do
quote generated: true do
_ = unquote(monad_type)
[unquote(value)]
end
end
defmacro flat_map(empty_list, fun) do
quote generated: true do
case unquote(empty_list) do
[] -> []
list -> ExCore.Monad.List.flat_map(list, unquote(fun)) # This case is purely for testing purposes
end
end
end
end
defimpl_ex List, [_ | _], for: ExCore.Monad do
@priority 65534
defmacro wrap(value, monad_type) do
quote generated: true do
_ = unquote(monad_type)
[unquote(value)]
end
end
def flat_map([], _fun), do: []
def flat_map([head | tail], fun), do: fun.(head) ++ flat_map(tail, fun)
end
defimpl_ex Map, %{}, for: ExCore.Monad do
@priority -65535
defmacro wrap(value, monad_type) do
quote generated: true do
case unquote(monad_type) do
%{__struct__: _} -> # Not structs
raise %ProtocolEx.UnimplementedProtocolEx{
proto: ExCore.Monad,
name: :wrap,
arity: 2,
value: [unquote(value), unquote(monad_type)],
}
_ ->
case unquote(value) do
{key, value} -> %{key => value}
value -> %{__value__: value}
end
end
end
end
def flat_map(%{__struct__: _} = map, fun) do
raise %ProtocolEx.UnimplementedProtocolEx{proto: ExCore.Monad, name: :flat_map, arity: 2, value: [map, fun]}
end
def flat_map(%{__value__: value} = map, fun) when map_size(map) === 1 do
fun.(value)
end
def flat_map(map, fun) do
map
|> :maps.to_list()
|> flat_map(fun, %{})
end
defp flat_map([], _fun, returned), do: returned
defp flat_map([element | rest], fun, returned) do
flat_map(rest, fun, :maps.merge(returned, fun.(element)))
end
end
defimpl_ex OkValue, {:ok, _}, for: ExCore.Monad do
@priority 65532
defmacro wrap(value, monad_type) do
quote do
_ = unquote(monad_type)
{:ok, unquote(value)}
end
end
defmacro flat_map(ok_value, fun) do
quote do
unquote(fun).(elem(unquote(ok_value), 1))
end
end
end
defimpl_ex Ok, :ok, for: ExCore.Monad do
@priority 65533
defmacro wrap(value, monad_type) do
quote do
_ = unquote(monad_type)
{:ok, unquote(value)}
end
end
defmacro flat_map(ok_value, fun) do
quote generated: true do
fun = unquote(fun)
case unquote(ok_value) do
{:ok, value} -> fun.(value)
:ok -> fun.(nil)
end
end
end
end
defimpl_ex ErrorValue, {:error, _}, for: ExCore.Monad do
@priority 65532
defmacro wrap(value, monad_type) do
quote do
_ = unquote(monad_type)
{:error, unquote(value)}
end
end
defmacro flat_map(error_value, fun) do
quote do
_ = unquote(fun)
unquote(error_value)
end
end
end
defimpl_ex Error, :error, for: ExCore.Monad do
@priority 65533
defmacro wrap(value, monad_type) do
quote do
_ = unquote(monad_type)
{:error, unquote(value)}
end
end
defmacro flat_map(error_value, fun) do
quote do
_ = unquote(fun)
unquote(error_value)
end
end
end
And I can show the generated output code too if curious?
True true!
I actually quite like how C++ does it. Not really proper categorical setup but highly useful while being descriptive about ‘how’ it can be used too.
Yep, that is definitely want you want for a Foldable!
Yeah that is hard to do perfectly efficiently the way maps are encoded currently.
Indeed! Folding is definitely not flat_mapping, hence why they are different categories. ^.^
That is why fold
is one of the root-bases in the Categorical Type tree (Monad technically inherits from it and others optionally to delegate to fold and flatten to implement flat_map automatically).
Hmm, yeah I’ve never seen that done, it makes sense though that it would not call back into the runtime as BIF’s are supposed to be *fast*fast*fast*!
Another drawback of reducers is, because the collection is the one controlling the reducing, we cannot implement operations like zip that requires taking one item from a collection, then suspending the reduction, then taking an item from another collection, suspending it, and starting again by resuming the first one and so on.
Again, at least not in a purely functional way.
Actually Categorical Types has this covered too, there is a basic 1-type reducer, and there is a basic 2-type reducer, up to N-type reducers.
Most systems only implement up to 2 because with 2 you can reduce over 2 at a time to reduce to a list of lists until you get a final list of all lists to pass to the final reducing function as a list of an element at a time of each sub list.
Ah cool.