I Challenge you: Implement containers!

2 posts were merged into an existing topic: Tuple Calls

9 posts were split to a new topic: Tuple Calls

NOTE: Because it became quite involved and off-topic, the discussion about Tuple Calls has been split off to its own topic, here: Tuple Calls

Instead of doing tuple calls, I can think of two ways to separate the interface from the implementation(s):

  1. Protocols: As example, create a Queue protocol that can be implemented by any queue implementation in existence.
    Drawbacks:
  • Non-protocol functions cannot be written in the same module, so an extra layer of indirection must be added (such as with Enumerable and Enum).
  • Implementations are restricted to be structs.
  • Protocols do not work for functions where the first argument is not the thing the protocol should dispatch on.
  • Creation cannot happen through the protocol directly, but only through the wrapper.
  • Functions that are part of the API are not part of the implementation’s API: The only way you can interact with the implementation is through the protocol. I’m not sure this is a good or a bad thing, but it definitely is a property.
  1. Behaviours that paritally mimic protocols.
    Drawbacks:
  • Sort of re-inventing the wheel.
  • To the user, a #Queue<[1,2,3]> remains a #Queue<[1,2,3]> even though the implementation might be different. (Is this good or bad?)
  • Possibility to use this for functions where we do not want to dispatch based on the first argument.

Hmm… after writing these out, I think that the proper approach is to go the Protocol + wrapper way. Maybe Queue.Interface for the protocol and Queue for the public API that dispatches some of its functionality to the protocol.
I furthermore think that it would be interesting and smart to create functions like:

new (data, opts \\ []) do
   impl = Keyword.get(opts, :implementation, Application.get_env(:data_types, :queue_implementation, Queue.DefaultImplementation))
   impl.new(data)
end

so people can alter the implementation that is used in the environment, or on a case-by-case basis, but they only need to be mindful about this if the default version is not to their liking.

I’ve been thinking more about the interfaces to containers.

Now, for all of the linear data types (lists, sets, multisets, queues, priority queues, arrays, maps, other kinds of associative arrays, etc.) it is easily possible to implement Collectable and Enumerable.

However, a closer look at Enumerable shows that it is not the inverse of Collectable at all: The caller is able to temporarily suspend the enumeration process, but it is impossible to only extract a single item, because the ‘enumerated-collection-in-progress’ is thrown away and only the accumulator is returned.
That is, Enumerable should maybe more accurately be named Foldable or Reducable. (It has two extra features on top of a pure reducable, which are the ability for the reducing function to stop before the end of the reducable is reached, and the ability to temporarily suspend an in-process enumeration if the caller asks for it. Only for things wrapping e.g. file handles or sockets the suspend-case might involve some extra clean-up code.)

However, there are many cases in which we are only interested in extracting one element from the current collection we have at hand. For lists, this should be the head, for a priority queue, the minimum(/maximum, depending on what kind of queue it is), for queues the front, for stacks the top, for unordered collections like sets and maps any element is equally fine, etc.

The signature of, let’s say: Extractable.extract is as follows:

extract(Extractable.t) :: {:ok, {item :: any, Extractable.t}} | :error

The implementations of this for the built-in types List and Tuple will be fast. Map and MapSet unfortunately not, as Erlang does not expose a way to get an arbitrary element from a Map, so we have to do a round-robin list conversion.

In other words: For Map and MapSet, the Enumerable implementation is basically an optimized version of wrapping Extractable!

How do you mean “arbitrary element” here? I have been requesting a function pair in the API first/next which would allow me efficiently to step through the keys of a map.

With “arbitrary element” I mean that since map elements (above a certain size; but that is an implementation detail) are stored in ‘some’ order and thus also returned in ‘some’ order, which might differ from invocation to invocation.

So I mean: The caller has no guarantees on which element is returned first. But of course, after that, the result map does not contain this element any more, so the next time(s) the extraction function is called, a different {key, value} will be returned.

Ah. My requirements are that I need to be able to step through all the key/value pairs in a map. The order is irrelevant but I need to be sure that I will get all pairs, as long as I don’t modify the map. I also return {key,map} in my implementation in my 2-3 trees.

I believe we are talking about the same thing, something akin to:

iex> original_map = %{a: 1, b: 2, c: 3}
iex> {:ok, {item, rest_map}} = Extractable.extract(original_map)
{:ok, {{:c, 3}, %{a: 1, b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:a, 1}, %{b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:b, 2}, %{}}}
iex> Extractable.extract(rest_map)
:error

Not really. One problem with your example is that you are updating the map every time which is inefficient. I was thinking something along the lines of how ETS does it with first/next:

10> {Key1,Val1} = maps:first(Map).
11> {Key2,Val2| = maps:next(Key1, Map).
12> {Key3,Val3} = maps:next(Key2, Map).
13> error = maps:next(Key3, Map).

(in Erlang) Definitely possible. ETS does it like this but only returns the key, in this case returning {Key,Value} is much better. I implemented this in my 2-3 trees as I needed it for Luerl.

3 Likes

@rvirding do you know how ETS stores the internal state of the table-reading while exposing first/next? It seems something that is rather stateful. Is it something that is managed per-process somewhere?


This evening, I have written and published the first versions of Insertable and Extractable, as well as implementing them for the Okasaki Queues (which is not yet released on Hex).

You pass the last key to next(table, last_key) and ETS returns the next key according to the table’s specified or internal ordering. ETS doesn’t store the table-reading state, you do.

2 Likes

Thank you! I somehow missed that :ets.next took two arguments :slight_smile: .

1 Like

All right, I am facing another thing that is somewhat of a challenge.

Naming things is truly one of the hardest problems of a Computing Scientist.

I have created Extractable and Insertable and they work well. But now I’d like to build abstractions on top of these protocols. These abstractions being part of the original protocol-defining libraries does not make sense, because there might be more, different abstractions that could be built on top.

I’m probably going to create a library called Collection that contains common functionality on collections. Collection will probably depend on FunLand, Extractable and Insertable. The interface of the Collectionmodule itself will probably look similar to Enumerable, with the difference that any collection in the output of its functions is not converted into a list.

If there is a better approach, please tell me :slight_smile: .

Alongside of this, I am currently working more on the Okasaki queues, and also started building some simple tree implementations.

1 Like

I’ve released a preliminary version of Prioqueue, which builds a structured priority queue protocol interface and allows multiple implementations to be built on top of this.

1 Like

Prioqueue has been updated to now also contain a PairingHeap-based implementation. The library now also contains some proper tests :slight_smile: .

1 Like

Hmm, I’m not seeing PropEr in there:

/me coughs

^.^

@OvermindDL1 It took me way too long to understand the joke… :laughing:

Prioqueue has been updated once more: It now contains a lot better documentation, as well as implementations for many of the Protocols of FunLand, as well as Extractable, Insertable and Elixir’s own Enumerable and Collectable. (See the docs)

Oh, and the Prioqueue priority queue protocol itself can of course be overridden by other priority queue implementations in the future :smiley: .

Prioqueue is more-or-less done, I think.
Feedback is greatly appreciated.


On a related note: I’ve released a prerelease version 0.8 of FunLand, which is significantly backwards incompatible with the previous prerelease version, since:

  • It adds a lot of functionality. (Implementation of Traversable)
  • It has some breaking name changes.
  • Some bells-and-whistles behaviours were changed from opt-out to opt-in (less implicit magic, more explicit power)

There still are some things I am not happy with. The most important things that will still change before 1.0:

  • Documentation. A lot of it.
  • Moving the zoo of ADT implementations living under the FunLandic namespace off to their own library.
  • Maybe some more breaking name changes, to make things more clear.
1 Like

Looks really cool, I quite like the API. :slight_smile:

1 Like

@Qqwy, slightly offtopic, but I was looking at your Tensor library because I’m quite interested in starting building a Machine Learning library for Erlang/Elixir. I know Erlang is not the fastest language for number crunching, but I have hopes due to the upcoming JIT compiler being quite fast (currently it’s as fast as HiPE), dirty schedulers and ease of orchestrate distributed computations between nodes.

2 Likes

@imetallica Nice!

Tensor should be reasonably fast if you are working with sparse data sets.

I am currently debating to write a second, terse, vector/matrix/tensor library which would outsource the work to Rust.
However, I’d like this work to happen in a proper, semi-preemptive fashion, which currently faces two hurdles:

  1. Rustler does not yet expose enif_schedule_nif() (see here) because there is no known way to make it fully compile-time safe.
  2. I have no idea how to ‘turn an iteration over a vector or matrix inside-out’. I think this would mean that iteration over the Vec would happen in a coroutine-fashion, but I have no idea how this could be done. This is necessary because enif_schedule_nif() takes a function pointer to call once the work should continue.

I will update Tensor today to interface with Extractable and Insertable :smiley: .

1 Like