I Challenge you: Implement containers!

Qqwy · June 28, 2017, 7:39pm

All right, I am facing another thing that is somewhat of a challenge.

Naming things is truly one of the hardest problems of a Computing Scientist.

I have created Extractable and Insertable and they work well. But now I’d like to build abstractions on top of these protocols. These abstractions being part of the original protocol-defining libraries does not make sense, because there might be more, different abstractions that could be built on top.

I’m probably going to create a library called Collection that contains common functionality on collections. Collection will probably depend on FunLand, Extractable and Insertable. The interface of the Collectionmodule itself will probably look similar to Enumerable, with the difference that any collection in the output of its functions is not converted into a list.

If there is a better approach, please tell me .

Alongside of this, I am currently working more on the Okasaki queues, and also started building some simple tree implementations.

Qqwy · June 28, 2017, 9:44pm

I’ve released a preliminary version of Prioqueue, which builds a structured priority queue protocol interface and allows multiple implementations to be built on top of this.

Qqwy · June 30, 2017, 2:07pm

Prioqueue has been updated to now also contain a PairingHeap-based implementation. The library now also contains some proper tests .

OvermindDL1 · June 30, 2017, 3:25pm

Hmm, I’m not seeing PropEr in there:

github.com

Qqwy/elixir-prioqueue/blob/master/mix.exs#L28-L32


      
          # {:dep_from_hexpm, "~> 0.3.0"},
          # {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"},
          {:extractable, "~> 0.2"},
          {:insertable, "~> 0.2"},
          {:fun_land, "~> 0.10.0"},

/me coughs

^.^

Qqwy · June 30, 2017, 6:57pm

@OvermindDL1 It took me way too long to understand the joke…

Prioqueue has been updated once more: It now contains a lot better documentation, as well as implementations for many of the Protocols of FunLand, as well as Extractable, Insertable and Elixir’s own Enumerable and Collectable. (See the docs)

Oh, and the Prioqueue priority queue protocol itself can of course be overridden by other priority queue implementations in the future .

Prioqueue is more-or-less done, I think.
Feedback is greatly appreciated.

On a related note: I’ve released a prerelease version 0.8 of FunLand, which is significantly backwards incompatible with the previous prerelease version, since:

It adds a lot of functionality. (Implementation of Traversable)
It has some breaking name changes.
Some bells-and-whistles behaviours were changed from opt-out to opt-in (less implicit magic, more explicit power)

There still are some things I am not happy with. The most important things that will still change before 1.0:

Documentation. A lot of it.
Moving the zoo of ADT implementations living under the FunLandic namespace off to their own library.
Maybe some more breaking name changes, to make things more clear.

OvermindDL1 · June 30, 2017, 7:32pm

Looks really cool, I quite like the API.

imetallica · June 30, 2017, 8:54pm

@Qqwy, slightly offtopic, but I was looking at your Tensor library because I’m quite interested in starting building a Machine Learning library for Erlang/Elixir. I know Erlang is not the fastest language for number crunching, but I have hopes due to the upcoming JIT compiler being quite fast (currently it’s as fast as HiPE), dirty schedulers and ease of orchestrate distributed computations between nodes.

Qqwy · July 1, 2017, 8:37am

@imetallica Nice!

Tensor should be reasonably fast if you are working with sparse data sets.

I am currently debating to write a second, terse, vector/matrix/tensor library which would outsource the work to Rust.
However, I’d like this work to happen in a proper, semi-preemptive fashion, which currently faces two hurdles:

Rustler does not yet expose enif_schedule_nif() (see here) because there is no known way to make it fully compile-time safe.
I have no idea how to ‘turn an iteration over a vector or matrix inside-out’. I think this would mean that iteration over the Vec would happen in a coroutine-fashion, but I have no idea how this could be done. This is necessary because enif_schedule_nif() takes a function pointer to call once the work should continue.

I will update Tensor today to interface with Extractable and Insertable .

Qqwy · July 1, 2017, 1:05pm

All right!

Tensor has been updated to version 2.0. This version is backwards incompatible because of multiple major changes:

All modules are moved under the single Tensor namespace, to follow HexPM package publishing rules. use use Tensor to alias all of Tensor, Matrix and Vector in your code and us them just as before.
Numbers has had multiple important changes between version 1.0 and 4.0, these changes have been reflected in Tensor.
Other, new features are:
FunLand.Mappable implementation, FunLand.Reducable implementation.
Extractable implementation.
Insertable implementation.
Tests for the above.

Qqwy · July 1, 2017, 1:24pm

Oh, before I forget!

Extractable and Insertable have been updated to 0.2.0 as well. This is a backwards-incompatible change. (But probably the last; I think the API is now stable and if this is indeed the case, soon a stable 1.0 version will be released).

The backwards-incompatible change is that in case of errors, instead of returning :error, the implementations should return {:error, reason}, with standardized reasons for Extractable being :empty, and for Insertable being :invalid_item_type and :full.

Prioqueue has been updated to reflect these changes as well, now being version 0.2.4 .

Qqwy · July 1, 2017, 10:52pm

And… Sets is released!

Similar to Prioqueue and Okasaki, this is a common interface to work with Sets, with multiple implementations that can be chosen between either when creating the set, or in your application’s documentation.

Implementations for MapSet, :sets, :gb_sets and :ordsets are included.

Qqwy · July 2, 2017, 2:03am

And a final one for today: Arrays.

Similar to Prioqueue, Okasaki and Sets, this is a common interface to work with arrays that have fast random element access.

It contains two implementations right now, namely a version that wraps Erlang’s :array, and MapArray that uses integer keys as indexing into a map (But in a somewhat different way than I do it in Tensor; MapArray is not a sparse but a terse representation).

Qqwy · July 2, 2017, 2:36am

And just after I said that, I decided to finish up Okasaki as well.
It still needs some more documentation, but it definitely is stable enough to release on Hex.PM as-is.

Qqwy · July 2, 2017, 10:39pm

Inspired by the earlier discussion with @rvirding, I am currently creating a library, Iter, that allows to iterate over arbitrary data structures in a possibly non-destructive way.

That is, where Extractable.extract will extract an item and return a collection with that item removed, Iter.next() will return the next item and a collection that might or might not be altered.

This means that Iterator structs wrapping things like ETS tables, :gb_sets and :gb_trees can be made as well.

There are two kinds of Iterators inside Iter:

‘Normal’ iterators which might convert the thing they iterate on to a structure more appropriate for iteration. (This is used for many datatypes that right now do not have built-in first/1, next/2-support; they are frequently changed into lists)
and ‘Persistent’ iterators which are guaranteed to be able to reconstruct the thing they iterate over at the end.

Note that the functionality of a PersistentIterator is a superset of a normal Iterator, possibly at the cost of some time and memory efficiency.

I am not yet sure what (if any) advantage a PersistentIterator has over a non-persistent iterator, as we usually still have access to the original data structure before it was turned into an iterator.

But I wanted to build it, in part because I thought of a fun way to (ab)use improper lists to reverse-concatenate two lists (restoring a zipper back to its original list): https://github.com/Qqwy/elixir-iter/blob/master/lib/iter/implementations/list.ex#L63

And in part because there very well might be a proper use for them.

Because these Iterators only iterate on-demand, they are per definition lazy.
I am going to think about what would be the best way to create higher-order iterators (‘iteratees’ or ‘iterator adaptors’), and how the efficiency of the thing could be improved.

This style of iterating is based on the ‘Iteratee’ concept used in Haskell, and also the Iterators in Rust. (Rusts’ iterators work with owned, mutable data however, and therefore are both faster and guaranteed to extract items returned by the iteration from their containers; two guarantees Iter cannot give you).

OvermindDL1 · July 3, 2017, 3:01pm

That would be nice, but if doing that I’d link to a BLAS-like library instead (think NumPy for Python, but for Erlang). That way you can built up a set of instructions (macro?) that is sent to the BLAS engine to process potentially huge amounts of data at once then return the result (probably not explicitly returned either but rather as a BEAM resource ID so you can use the API to pull out bits of it or to an erlang format but keeping the base of it as highly optimized as possible, again similar to how Python does it).

With the new async: on option by default in OTP20 this would fit that use-case so well, and I would definitely use such a library. Honestly I really would build it on top of a BLAS library (one that does not crash please if you link it as a NIF or so).

Also, you’ve been busy! ^.^

Qqwy · July 6, 2017, 11:35am

Okasaki has received an overhaul and is now released as v1.0!

Changes are:

Everything is now documented!
ErlangQueue and ErlangDeque implementations that wrap :queue.
Implementing FunLand.Mappable for the queues and deques.
More tests!

daveman1010220 · October 30, 2017, 4:31pm

If this will not be part of the standard library some day, it would be excellent to have all of these containers in one package. If you have one package that supports a large set of standard data structures then it will be much easier for someone working on a BLAS library to make use of it. Having just a couple of well-thought-out packages that contain most basic data structures and the most common math functions would be tremendously useful, in my opinion. You can then tell people, “Just use the math and containers packages”…

Qqwy · March 26, 2018, 5:11am

Yes, although there is also something to be said for keeping each of the packages small (only containing the public API and one or two default implementations), with a bunch of other implementation packages to be used when more efficiency or other functionality is required.