I Challenge you: Implement containers!

I think it does NOT make sense to put everything into a single package. Separation of concerns and stuff :wink:

I’m not sure though, how to organize packages, dependencies, structures and algorythms.


Yes, I agree :slight_smile: .

I think maybe the most sense would make to brush up fun_land to make sure it actually implements the ADT-hierarchy properly (which it already does, but there are some omissions) and most importantly: it becoming more clear in its usage.
It should be very friendly to use. (And maybe it needs a better name?)

The idea is to both create a set of behaviours that allows implementing the different ADT typeclasses, as well as a clear public API that internally dispatches on them, allowing functions like take_while, split, etc. to work on anything that implements the correct behaviours.

(Behaviours are used for two reasons over protocols here:

  1. The functions that are part of the behaviour also become part of the public API of the implementor’s module. So Mappable.map(%Queue{}, fun) will dispatch to Queue.map(%Queue{}, fun), but the latter is also directly visible and introspectable.
  2. It is possible to create default implementations for many operations if the data types implement a higher typeclass: For instance, map can be built out of pure and apply. I’d like to specify these as defoverridable to give programmers the flexibility to still implement them themselves (for e.g. extra speed). AFAIK, using defoverridable is impossible with protocols.


On the other hand, there are also multiple data structures that could be built on top of different practical implementations. An example: A binary tree could both be built as nested maps/tuples, or instead on top of an array (which could be built on top of a numerical-indexed map). It therefore makes sense to create a unified interface for these data types as well, and separate their external interface from their internal workings. Main drawback here might be performance, however.

Sorry I don’t quite understand what it is you want. Are you after some specific types of containers or are you after examples of different ways of implementing them? Take for examples trees, why have you given various examples of how to implement them [*]. However you look at it trees are just different ways of (efficiently ?) implementing key-value stores so why not just use a map [**]? That they are trees is just an implementation detail.

Same with the other containers. A set is set irrespective of whether it is a tree or a hash or a …

[*] Of which I have implemented most of them for fun. Data structures are fun.
[**] One reason is that maps don’t contain some access function you are after. I did this for my Lua implementation because I need a first/next function pair which allows me to step through the map. I used 2-3 trees.


@rvirding I am interested in starting a coordinated effort to create multiple different types of containers that share common interfaces, ensuring that people can pick the most appropriate one for the job at hand.

The reason of listing multiple different possible implementation methods is to try the different possibilities and find out which one are more efficient (in space, time and developer efficiency) under certain conditions. A set can be built on top of multiple different ‘concreter’ data structures, but these different structures that are used as foundation for the set have a direct impact on the efficiency in certain conditions of the resulting set. Therefore I’d like to try multiple approaches in a coordinated fashion, to ensure a common API.

As for trees: It is true that trees are often used either as implementation for a binary heap (resulting in a priority queue) or a binary search tree (resulting in a key->value store). But there are also multiple other trees that just exist to contain relations of their own, such as Segment Trees, Rose Trees and R-trees.

Does that answer your questions?

If using them as a KV store sure, but I almost never do. Look at my ExSpirit library, it uses a tree to parse single characters at a time to efficiently get a ‘longest’ match. Doing that in maps would involve holding information on the ‘longest’ possible key or do a linear search or so and it just gets far more messy. ^.^

Honestly, I’d think all these structures are prime cases of using tuple modules. You could make a normal @behaviour and have each implement it, it would have things like map and fold_left and reduce and so forth perhaps with a use for overridable defaults based on a certain base required set, then you could use it like:

iex> q = Queue.new()
iex> q = q.push(1)
iex> q = q.push(2)
iex> q = q.push(3).push(4).push(5)
iex> {value, q} = q.pop()
iex> list_of_values = q.into([])

iex> g = Graph.new()
iex> g = g.addNode(:a).addNode(:b).addConnection(:a, :b, weight: 42).addNode(:c).addConnection(:a, :c)
iex> nodes = g.getNodes()
iex> path = g.path(from: :a, to: :b, solver: :shortest)

iex> inspect_each = fn c -> c.map(&IO.inspect/1) end
iex> inspect_each.(q) # Generic algorithms!
iex> inspect_each.(g) # Generic algorithms!

And so forth. It is using them like you use First-Class Modules in OCaml, it is a very powerful, functional, and immutable construct.

/me really really wishes people were not so scared of tuple-calls…


2 posts were merged into an existing topic: Tuple Calls

9 posts were split to a new topic: Tuple Calls

NOTE: Because it became quite involved and off-topic, the discussion about Tuple Calls has been split off to its own topic, here: Tuple Calls

Instead of doing tuple calls, I can think of two ways to separate the interface from the implementation(s):

  1. Protocols: As example, create a Queue protocol that can be implemented by any queue implementation in existence.
  • Non-protocol functions cannot be written in the same module, so an extra layer of indirection must be added (such as with Enumerable and Enum).
  • Implementations are restricted to be structs.
  • Protocols do not work for functions where the first argument is not the thing the protocol should dispatch on.
  • Creation cannot happen through the protocol directly, but only through the wrapper.
  • Functions that are part of the API are not part of the implementation’s API: The only way you can interact with the implementation is through the protocol. I’m not sure this is a good or a bad thing, but it definitely is a property.
  1. Behaviours that paritally mimic protocols.
  • Sort of re-inventing the wheel.
  • To the user, a #Queue<[1,2,3]> remains a #Queue<[1,2,3]> even though the implementation might be different. (Is this good or bad?)
  • Possibility to use this for functions where we do not want to dispatch based on the first argument.

Hmm… after writing these out, I think that the proper approach is to go the Protocol + wrapper way. Maybe Queue.Interface for the protocol and Queue for the public API that dispatches some of its functionality to the protocol.
I furthermore think that it would be interesting and smart to create functions like:

new (data, opts \\ []) do
   impl = Keyword.get(opts, :implementation, Application.get_env(:data_types, :queue_implementation, Queue.DefaultImplementation))

so people can alter the implementation that is used in the environment, or on a case-by-case basis, but they only need to be mindful about this if the default version is not to their liking.

I’ve been thinking more about the interfaces to containers.

Now, for all of the linear data types (lists, sets, multisets, queues, priority queues, arrays, maps, other kinds of associative arrays, etc.) it is easily possible to implement Collectable and Enumerable.

However, a closer look at Enumerable shows that it is not the inverse of Collectable at all: The caller is able to temporarily suspend the enumeration process, but it is impossible to only extract a single item, because the ‘enumerated-collection-in-progress’ is thrown away and only the accumulator is returned.
That is, Enumerable should maybe more accurately be named Foldable or Reducable. (It has two extra features on top of a pure reducable, which are the ability for the reducing function to stop before the end of the reducable is reached, and the ability to temporarily suspend an in-process enumeration if the caller asks for it. Only for things wrapping e.g. file handles or sockets the suspend-case might involve some extra clean-up code.)

However, there are many cases in which we are only interested in extracting one element from the current collection we have at hand. For lists, this should be the head, for a priority queue, the minimum(/maximum, depending on what kind of queue it is), for queues the front, for stacks the top, for unordered collections like sets and maps any element is equally fine, etc.

The signature of, let’s say: Extractable.extract is as follows:

extract(Extractable.t) :: {:ok, {item :: any, Extractable.t}} | :error

The implementations of this for the built-in types List and Tuple will be fast. Map and MapSet unfortunately not, as Erlang does not expose a way to get an arbitrary element from a Map, so we have to do a round-robin list conversion.

In other words: For Map and MapSet, the Enumerable implementation is basically an optimized version of wrapping Extractable!

How do you mean “arbitrary element” here? I have been requesting a function pair in the API first/next which would allow me efficiently to step through the keys of a map.

With “arbitrary element” I mean that since map elements (above a certain size; but that is an implementation detail) are stored in ‘some’ order and thus also returned in ‘some’ order, which might differ from invocation to invocation.

So I mean: The caller has no guarantees on which element is returned first. But of course, after that, the result map does not contain this element any more, so the next time(s) the extraction function is called, a different {key, value} will be returned.

Ah. My requirements are that I need to be able to step through all the key/value pairs in a map. The order is irrelevant but I need to be sure that I will get all pairs, as long as I don’t modify the map. I also return {key,map} in my implementation in my 2-3 trees.

I believe we are talking about the same thing, something akin to:

iex> original_map = %{a: 1, b: 2, c: 3}
iex> {:ok, {item, rest_map}} = Extractable.extract(original_map)
{:ok, {{:c, 3}, %{a: 1, b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:a, 1}, %{b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:b, 2}, %{}}}
iex> Extractable.extract(rest_map)

Not really. One problem with your example is that you are updating the map every time which is inefficient. I was thinking something along the lines of how ETS does it with first/next:

10> {Key1,Val1} = maps:first(Map).
11> {Key2,Val2| = maps:next(Key1, Map).
12> {Key3,Val3} = maps:next(Key2, Map).
13> error = maps:next(Key3, Map).

(in Erlang) Definitely possible. ETS does it like this but only returns the key, in this case returning {Key,Value} is much better. I implemented this in my 2-3 trees as I needed it for Luerl.


@rvirding do you know how ETS stores the internal state of the table-reading while exposing first/next? It seems something that is rather stateful. Is it something that is managed per-process somewhere?

This evening, I have written and published the first versions of Insertable and Extractable, as well as implementing them for the Okasaki Queues (which is not yet released on Hex).

You pass the last key to next(table, last_key) and ETS returns the next key according to the table’s specified or internal ordering. ETS doesn’t store the table-reading state, you do.


Thank you! I somehow missed that :ets.next took two arguments :slight_smile: .

1 Like

All right, I am facing another thing that is somewhat of a challenge.

Naming things is truly one of the hardest problems of a Computing Scientist.

I have created Extractable and Insertable and they work well. But now I’d like to build abstractions on top of these protocols. These abstractions being part of the original protocol-defining libraries does not make sense, because there might be more, different abstractions that could be built on top.

I’m probably going to create a library called Collection that contains common functionality on collections. Collection will probably depend on FunLand, Extractable and Insertable. The interface of the Collectionmodule itself will probably look similar to Enumerable, with the difference that any collection in the output of its functions is not converted into a list.

If there is a better approach, please tell me :slight_smile: .

Alongside of this, I am currently working more on the Okasaki queues, and also started building some simple tree implementations.

1 Like

I’ve released a preliminary version of Prioqueue, which builds a structured priority queue protocol interface and allows multiple implementations to be built on top of this.

1 Like

Prioqueue has been updated to now also contain a PairingHeap-based implementation. The library now also contains some proper tests :slight_smile: .

1 Like