Would there be interest in a Persistent Vector datatype?

Qqwy · February 15, 2018, 10:34am

Just now I came across Understanding Persistent Vectors, which describes how Clojure implements a vector-type, which for many operations allows faster access to its elements than a linked list.

I think this is very interesting, and an implementation of this in Rust exists (which was how I found it in the first place).

Now I am wondering: Are there people besides me that think it would be great to have a fast, indexable ordered container? I think it would be relatively straightforward to wrap the Rust library using Rustler.

What do you think? An alternate approach would be to implement the logic directly in Elixir, but that would probably be a couple of magnitudes slower.

In any case, I did not know that it was possible to create persistent vectors before (and okay, under the hood they are implemented using special balanced binary trees, so access is O(log32 n) instead of O(1) for mutable vectors), so I thought this might be interesting for not just me but you all as well .

NobbZ · February 15, 2018, 12:04pm

I haven’t followed your links. But when I read the last time about clojures vectors they where described as a linked list of chunks. And as of such had a read of about O(n/chucnk size) (which is theory equal to O(n) but feels faster).

But you say balanced tree, and then :array comes to my mind. Its not binary, but finger/rose tree, but still kind of balanced.

OvermindDL1 · February 15, 2018, 5:00pm

Yeah this was my thought. They based it on a balance of 10 as they found it to be the most efficient on the BEAM VM at the time the tests were done, but it is a tree, though not entirely balanced, however it is sparse-safe and fast (though depending on usage a map might be faster).

EDIT: Also :array is old, it is very likely that the BEAM VM would be optimized for better settings than what it does now too. That’d be an interesting test.

Qqwy · February 15, 2018, 7:35pm

And I believe :array is implemented completely in Erlang (rather than in C) (OTP source), which means that it is somewhat slower, and that the data structure is emulated using nested Erlang tuples, rather than using a thight internal tree representation.

mischov · February 15, 2018, 7:59pm

I have missed Clojure’s persistent vectors (in Clojure they’re generally the go-to ordered collection instead of lists), but I would be hesitant to use them very widely if they’re not a core (BEAM) type.

I also don’t know how I’d feel about a widely used datatype in my application being at the mercy of Rustler not having ERTS-related compilation errors.

orestis · February 16, 2018, 11:27am

Having persistent vectors would be very cool indeed — but I wonder how this would work with the “shared nothing” memory of the BEAM?

Since this is all immutable data, you could have all of it in shared memory, and just send references to processes. But if you have to copy the entire tree every time you send it, it will quickly become very slow.

Interestingly enough, according to this, the Erlang Map implementation is based on Rich Hickey’s work. So you could probably just use the maps to simulate a persistent vector without going to Rust at all.

Qqwy · February 17, 2018, 8:39am

An implementation of vectors/arrays built on top of maps is already available in the arrays package that is available on Hex.

I haven’t benchmarked them yet, but I’d expect them to usually be faster than Erlang’s :array implementation (which above library also has a wrapper for), because that one was built before maps were a thing.

As for tree copying: Copying a whole tree only adds log(n) elements to be copied to a process that would take n elements in the case of a sequential container structure like a list, so this is not ‘quickly becoming slow’ because the copying still happens in O(n) time complexity.

But we could benchmark this stuff at some point, I guess