Does Elixir have persistent/immutable vectors?

zeroexcuses · April 27, 2021, 10:35am

On hex.pm, I found persistent_vector | Hex , but it does not seem very widely used.

Does Elixir have persistent/immutable vectors? I’m looking for a data structure where (1) all ops are either amortized O(1) or O(log n) and (2) it’s persistent/immutable in that we can keep all old versions.

hauleth · April 27, 2021, 11:59am

There is array module that provides similar functionality.

benwilson512 · April 27, 2021, 12:17pm

Has that been benchmarked lately against a simple map where the keys are indices?

zeroexcuses · April 27, 2021, 12:20pm

@hauleth : Arrays look interesting. Thanks! According to memory - Arrays implementation in erlang - Stack Overflow they appear to be balanced trees w/ split of 10 per node, making most ops worst case O(log_10 n).

However, is there a way to concat two arrays in O(log n) time? Could not find at Erlang -- array

@benwilson512 : The problem with using map for this is: that if we have keys 0, 1, 2, …, n
and insert an item between 2 & 3, we suddenly have to shift the keys 3, 4, 5, …, n which is O(n)

OvermindDL1 · April 27, 2021, 6:09pm

I have in the past, map is the same at 1 element (index => value mapping), significantly slower at up to 32 elements, then slower but not ‘as’ bad above that, averaging 8 times slower from 1000 elements on up (on my server at least). :array is better if you need array-like functionality.

For note, merging is not array-like functionality, however you can still do that with :array’s via something like :array.sparse_foldl, just pass the array you are merging in “to” as the initial value then for each element in the array you are folding over you just insert it into that array. There is also an :array.foldl but that will iterate over all values in the array, not just ‘set’ values, so it can be a lot slower for no reason.

Inserting is DEFINITELY not recommended array functionality in any language as that causes array’s to shift around everything after the point of which you are inserting. I would think that using :array’s would be faster on that as well but it’s still not a recommended operation.

And yes, the old erlang :array module might not have a lot of helper functions, but with map/foldl/foldr and the sparse variants you can build any helpers on those, it gives you what you need to implement anything on it.

And don’t discount using tuples. Changing the length of a tuple is a full reallocation instead of being cheap like in :array or maps, but you just can’t get faster read access if you need it, though at the cost of writing, lol.

Overall, I would use :array’s, or some elixir library of the same (I know I’ve seen a couple, I think I even made one years ago that I never published).

Map’s are almost as fast, they get to about 8x slower than :array on my system for almost all operations but that’s a linear increase over :array, it doesn’t grow over time than that. Another bonus is that :array’s will use less memory as well, about a third of maps.

Now if you really do want faster insertions then it wouldn’t be too hard to make a custom thing that combines the internal storage of array’s with a tree instead, it would be slower than :array on most things, but it would make insertions a lot faster.

And if you don’t need indexed access but rather just ordered then don’t discount just using normal ordered trees!

zeroexcuses · April 27, 2021, 6:28pm

I think my use of the term ‘persistent vector’ may be causing unnecessary confusion.

I am looking for something similar to Clojure’s persistent vectors: Understanding Clojure's Persistent Vectors, pt. 1

which does have O(log n) concat/insert/update/delete, as well as (through having ‘buffers’ at the front/back) have amortized
O(1) insert/update/delete at front/back.

OvermindDL1 · April 27, 2021, 6:30pm

Clojure’s persistent vectors aren’t vectors at all, they are indexed tree’s, hence my suggestion of making an indexed tree (or some may already exist as libraries).

sasajuric · April 27, 2021, 7:04pm

You can find some benches in this blog post.

sasajuric · April 27, 2021, 7:11pm

I haven’t used it myself, but this looks promising.

OvermindDL1 · April 27, 2021, 7:11pm

Also, if you don’t need indexed access but just ordered iterative access, then a simple zipper (a pattern, not a library) allows for O(1) insertions at the location of the cursor.

zeroexcuses · April 27, 2021, 8:14pm

@sasajuric : Thanks for sharing the detailed analysis / blog post.

One thing that really surprises me about the lack of RRB Vectors is that under “premature optimization = evil”, I’ve basically defaulted to RRB Vector for everything until benchmarks prove I should use something else.