2 posts were merged into an existing topic: Tuple Calls
NOTE: Because it became quite involved and off-topic, the discussion about Tuple Calls has been split off to its own topic, here: Tuple Calls
Instead of doing tuple calls, I can think of two ways to separate the interface from the implementation(s):
- Protocols: As example, create a
Queue
protocol that can be implemented by any queue implementation in existence.
Drawbacks:
- Non-protocol functions cannot be written in the same module, so an extra layer of indirection must be added (such as with
Enumerable
andEnum
). - Implementations are restricted to be structs.
- Protocols do not work for functions where the first argument is not the thing the protocol should dispatch on.
- Creation cannot happen through the protocol directly, but only through the wrapper.
- Functions that are part of the API are not part of the implementation’s API: The only way you can interact with the implementation is through the protocol. I’m not sure this is a good or a bad thing, but it definitely is a property.
- Behaviours that paritally mimic protocols.
Drawbacks:
- Sort of re-inventing the wheel.
- To the user, a
#Queue<[1,2,3]>
remains a#Queue<[1,2,3]>
even though the implementation might be different. (Is this good or bad?) - Possibility to use this for functions where we do not want to dispatch based on the first argument.
Hmm… after writing these out, I think that the proper approach is to go the Protocol + wrapper way. Maybe Queue.Interface
for the protocol and Queue
for the public API that dispatches some of its functionality to the protocol.
I furthermore think that it would be interesting and smart to create functions like:
new (data, opts \\ []) do
impl = Keyword.get(opts, :implementation, Application.get_env(:data_types, :queue_implementation, Queue.DefaultImplementation))
impl.new(data)
end
so people can alter the implementation that is used in the environment, or on a case-by-case basis, but they only need to be mindful about this if the default version is not to their liking.
I’ve been thinking more about the interfaces to containers.
Now, for all of the linear data types (lists, sets, multisets, queues, priority queues, arrays, maps, other kinds of associative arrays, etc.) it is easily possible to implement Collectable
and Enumerable
.
However, a closer look at Enumerable
shows that it is not the inverse of Collectable at all: The caller is able to temporarily suspend the enumeration process, but it is impossible to only extract a single item, because the ‘enumerated-collection-in-progress’ is thrown away and only the accumulator is returned.
That is, Enumerable
should maybe more accurately be named Foldable
or Reducable
. (It has two extra features on top of a pure reducable, which are the ability for the reducing function to stop before the end of the reducable is reached, and the ability to temporarily suspend an in-process enumeration if the caller asks for it. Only for things wrapping e.g. file handles or sockets the suspend-case might involve some extra clean-up code.)
However, there are many cases in which we are only interested in extracting one element from the current collection we have at hand. For lists, this should be the head, for a priority queue, the minimum(/maximum, depending on what kind of queue it is), for queues the front, for stacks the top, for unordered collections like sets and maps any element is equally fine, etc.
The signature of, let’s say: Extractable.extract
is as follows:
extract(Extractable.t) :: {:ok, {item :: any, Extractable.t}} | :error
The implementations of this for the built-in types List and Tuple will be fast. Map and MapSet unfortunately not, as Erlang does not expose a way to get an arbitrary element from a Map, so we have to do a round-robin list conversion.
In other words: For Map and MapSet, the Enumerable implementation is basically an optimized version of wrapping Extractable!
How do you mean “arbitrary element” here? I have been requesting a function pair in the API first
/next
which would allow me efficiently to step through the keys of a map.
With “arbitrary element” I mean that since map elements (above a certain size; but that is an implementation detail) are stored in ‘some’ order and thus also returned in ‘some’ order, which might differ from invocation to invocation.
So I mean: The caller has no guarantees on which element is returned first. But of course, after that, the result map does not contain this element any more, so the next time(s) the extraction function is called, a different {key, value}
will be returned.
Ah. My requirements are that I need to be able to step through all the key/value pairs in a map. The order is irrelevant but I need to be sure that I will get all pairs, as long as I don’t modify the map. I also return {key,map}
in my implementation in my 2-3 trees.
I believe we are talking about the same thing, something akin to:
iex> original_map = %{a: 1, b: 2, c: 3}
iex> {:ok, {item, rest_map}} = Extractable.extract(original_map)
{:ok, {{:c, 3}, %{a: 1, b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:a, 1}, %{b: 2}}}
iex> {:ok, {item, rest_map}} = Extractable.extract(rest_map)
{:ok, {{:b, 2}, %{}}}
iex> Extractable.extract(rest_map)
:error
Not really. One problem with your example is that you are updating the map every time which is inefficient. I was thinking something along the lines of how ETS does it with first
/next
:
10> {Key1,Val1} = maps:first(Map).
11> {Key2,Val2| = maps:next(Key1, Map).
12> {Key3,Val3} = maps:next(Key2, Map).
13> error = maps:next(Key3, Map).
(in Erlang) Definitely possible. ETS does it like this but only returns the key, in this case returning {Key,Value}
is much better. I implemented this in my 2-3 trees as I needed it for Luerl.
@rvirding do you know how ETS stores the internal state of the table-reading while exposing first
/next
? It seems something that is rather stateful. Is it something that is managed per-process somewhere?
This evening, I have written and published the first versions of Insertable and Extractable, as well as implementing them for the Okasaki Queues (which is not yet released on Hex).
You pass the last key to next(table, last_key)
and ETS returns the next key according to the table’s specified or internal ordering. ETS doesn’t store the table-reading state, you do.
Thank you! I somehow missed that :ets.next
took two arguments .
All right, I am facing another thing that is somewhat of a challenge.
Naming things is truly one of the hardest problems of a Computing Scientist.
I have created Extractable
and Insertable
and they work well. But now I’d like to build abstractions on top of these protocols. These abstractions being part of the original protocol-defining libraries does not make sense, because there might be more, different abstractions that could be built on top.
I’m probably going to create a library called Collection
that contains common functionality on collections. Collection
will probably depend on FunLand
, Extractable
and Insertable
. The interface of the Collection
module itself will probably look similar to Enumerable
, with the difference that any collection in the output of its functions is not converted into a list.
If there is a better approach, please tell me .
Alongside of this, I am currently working more on the Okasaki queues, and also started building some simple tree implementations.
I’ve released a preliminary version of Prioqueue, which builds a structured priority queue protocol interface and allows multiple implementations to be built on top of this.
Prioqueue has been updated to now also contain a PairingHeap-based implementation. The library now also contains some proper tests .
@OvermindDL1 It took me way too long to understand the joke…
Prioqueue has been updated once more: It now contains a lot better documentation, as well as implementations for many of the Protocols of FunLand, as well as Extractable, Insertable and Elixir’s own Enumerable and Collectable. (See the docs)
Oh, and the Prioqueue priority queue protocol itself can of course be overridden by other priority queue implementations in the future .
Prioqueue is more-or-less done, I think.
Feedback is greatly appreciated.
On a related note: I’ve released a prerelease version 0.8 of FunLand, which is significantly backwards incompatible with the previous prerelease version, since:
- It adds a lot of functionality. (Implementation of Traversable)
- It has some breaking name changes.
- Some bells-and-whistles behaviours were changed from opt-out to opt-in (less implicit magic, more explicit power)
There still are some things I am not happy with. The most important things that will still change before 1.0:
- Documentation. A lot of it.
- Moving the zoo of ADT implementations living under the
FunLandic
namespace off to their own library. - Maybe some more breaking name changes, to make things more clear.
Looks really cool, I quite like the API.
@Qqwy, slightly offtopic, but I was looking at your Tensor library because I’m quite interested in starting building a Machine Learning library for Erlang/Elixir. I know Erlang is not the fastest language for number crunching, but I have hopes due to the upcoming JIT compiler being quite fast (currently it’s as fast as HiPE), dirty schedulers and ease of orchestrate distributed computations between nodes.
@imetallica Nice!
Tensor should be reasonably fast if you are working with sparse data sets.
I am currently debating to write a second, terse, vector/matrix/tensor library which would outsource the work to Rust.
However, I’d like this work to happen in a proper, semi-preemptive fashion, which currently faces two hurdles:
- Rustler does not yet expose
enif_schedule_nif()
(see here) because there is no known way to make it fully compile-time safe. - I have no idea how to ‘turn an iteration over a vector or matrix inside-out’. I think this would mean that iteration over the
Vec
would happen in a coroutine-fashion, but I have no idea how this could be done. This is necessary becauseenif_schedule_nif()
takes a function pointer to call once the work should continue.
I will update Tensor today to interface with Extractable and Insertable .