A Treatise of Enumerable, Collectable, Reducable, Extractable and Insertable

Qqwy · June 21, 2017, 4:27pm

Some time in the past I wanted to extract a few elements from a collection while at the same time keeping the rest of the collection intact (because based on the items I extracted, it might be necessary to continue extracting more in the future). However, it turned out that the Enumerable protocol is not suited for this.

Yesterday and today I did a little more research. I was busy implementing FunLand.Reducable, which is the pure (uninterruptable, except you cheat in a very dirty way by using ‘throw/catch’) form of what Enumerable does: It only has a reduce(collection, acc, function) which should call itself recursively until the whole collection is empty, and then the accumulator is returned.

Enumerable does more: It allows the caller to specify at any step in the reduction to:

halt, in which case the accumulator is returned.
suspend (which should always be continued later), in which case a function is returned to continue reducing later.
continue to next item in the normal way, in which case the Enumerable.reduce should call itself recursively.

This does mean that it is impossible to take ‘one’ item out of the enumerable, returning the rest, as either it is enumerated in full, or halted somewhere in the middle, but when halting in the middle only the accumulator is returned.

Now, Collectable has a very similar API, and also allows not only to insert one element, but rather to insert a group of them. In this sense (and also in its API return value naming), it is very much the inverse of Enumerable. Just like Enumerable, it allows you to wrap weird resources and run some cleanup after inserting the final item. (Such as reversing the constructed list or closing a file handle). But if we want to insert a single item, using Collectable does not make sense.
As example: Enum.into([my_single_item], [1,2,3]) returns [1,2,3,my_single_item] which means that underwater, something akin to [1,2,3] ++ [my_single_item] has to have happened.
(What actually happens is :lists.reverse([my_single_item | :lists.reverse(input)])

Therefore, it makes sense to extract these two more general but more slow protocols to their own protocols.
I am thus making the following protocols:

defprotocol Extractable do
  @spec extract(collection) :: {:ok, {item :: any, collection}} | :error
  def extract(collection)
end

defprotocol Insertable do
  @spec insert(collection, item) :: {:ok, collection} | :error
  def insert(collection, item)
end

These protocols return their results as a success tuple; :error is returned by extract when the collection is empty, and by insert if the collection is full (which might happen for things like circular buffers or fixed-size arrays).

For List, it only makes sense to insert and extract from the head of the list (so the list is treated as a stack).

For Map and MapSet, as Erlang does not yet expose a built-in way to extract an arbitrary item from a map, the only possibility to extract a single item is to convert the collection to a list, extract the head of this list, and convert the tail of the list back to a Map(/MapSet).
Thus, it can be seen that if you iterate over the whole Map/MapSet, using Enumerable is faster, as the conversion to/from list only happens once, whereas when wrapping Extractable, this would happen n times, once per element.

Still, because it is more general, Extractable and Insertable have some nice properties. Most importantly, it allows us to iterate over containers without throwing away their existing shape, which is what Reducable and Enumerable both will do, because the only way to have a collection afterwards is if you build a new one in your accumulator.

Extractable and Insertable thus make for more generic variants of split, split_with, split_while, drop,drop_while, and others. And besides this, they of course allow the insertion/extraction of a single element into a collection.

They also allow for more control over iterating infinite structures. It has happened to me in the past, where I was interested in the first item of a Stream (and depending on that, wanted to maybe read the rest later). But as soon as you take the first item using Enum.take(1), the rest of the stream is halted; if you later want to get the rest of the stream, you’ll need to add a Stream.drop(1) before it; resulting in repetitive work later because the initial part of the stream needs to be re-evaluated (to be directly discarded afterwards) each time. Extractable and Insertable allow to do things like this.

I am probably going to release these two protocols as separate libraries, since there are data structures out there for which implementing one but not the other makes sense.

Extractable and Insertable will be used together with FunLand, and the ongoing container implementation effort. Furthermore, I might at some point build an Iteratee-library on top.