Elixir array-like data structure?

Hi guys. I find myself in need of array-like data structure like the ones you find in Ruby, Python or Javascript.

I’m writing a genetic algorithm and when you deal with this kind of algorithms, you are constantly randomly selecting a position in it, slicing, appending, shuffling and combining. And using Lists for it just add time complexity given that for all of this operation the cost is almost always O(n).
I’ve tried using erlang’s :array but this is missing some of the feature above. Right now I’m workingaround the issue by using Maps with numerical keys, but again there are time when even this doesn’t cut it.

Is there a library I could use or maybe I need to create my own. Maybe wrapping Erlang’s :array or Elixir’s map to support all of my use cases?

There are 3 approaches you can take:

  • write your own module that will use tuples to store data
  • write functions you need for the :array
  • use NIF or port

It depends on your use case which one of these will be most efficient. In general there is no simple and efficient way to shuffle any structure in Elixir.

I’m reluctant to use Tuples because of this from the official documentation:

updating or adding elements to tuples is expensive because it requires creating a new tuple in memory:

2 Likes

:array module is built upon tuples. And the same shortcomings are when you want to use maps. So if you are doing many updates to the array then you need to use:

  • Process dictionary, but this will limit you to single process
  • ETS, but that requires careful editing and managing of the state, also you need to have “parent” process for the table
  • NIF
1 Like

There is also :atomics.
http://erlang.org/doc/man/atomics.html

If 64 bit integers are suitable for you. You can also use them as a bit array. (I have a lib to help with that: Abit)

5 Likes

I almost forgot:
When I was playing with genetic algorithms, I successfully sped up things with the Matrex library.

9 Likes

This has been my biggest gripe with elixir for quite awhile.

I think adding built in support for a first class array type or (a well integrated NIF at least) would go a long way.

2 Likes

Keep in mind that the data structure you are proposing is ill suited for distributed systems, which is why it’s not a first class citizen of the BEAM. I’m a math major working at a deep learning company, and maybe it’s Stockholm syndrome but I’m ok with this separation of concerns and having arrays be second class citizens of the ecosystem. Distribution is hard, and the trade-off of making arrays second class is, in my mind “totally worth it” to make distribution easy.

8 Likes

Minor nitpick - Javascript implements “normal” arrays (like you’d get from parsing the JSON string [1,2,3]) as maps under the hood. For instance, you can call [1,2,3].keys()

IMO this sounds like you need a more-specialized data structure than arrays - even Ruby arrays are going to have poor performance characteristics with things like inserting in the middle that can’t share structure.

For instance, a common solution you’ll find in a lot of text editors is the rope. It costs more complexity up-front, but it comes in very handy when (for instance) a user types one character at the start of a 10MB text file.

In general, immutability seems to require a little more thought when picking structures - for instance, a performant double-ended queue is more complicated without mutation so there’s the :queue module in stdlib. The comments in that module recommend reading “Purely Functional Data Structures” by C. Okasaki and I’ll +1 that recommendation.

4 Likes

wow, that paper is great and I really wish I had the time to implement them all in elixir.

I think that a build-in (to Erlang) version of a persistent data structure with properties similar Clojure’s vector (access to items by index in log32N hops, optimized for append rather than list’s prepend) would be very beneficial while not being any less suited to Erlang than a map.

2 Likes

You basically already have this with a map where the keys are indices and the values are the value. %{0 => "first item", 1 => "second item"} is O(log32(n)). I swear I’ve seen a hex package that implements an array like UI around this.

1 Like

A map with keys as indices isn’t ordered (which is to say, you don’t “basically already have this”, there’s still a lot to do).

RE Hex packages, take your pick: persistent_vector, steady_vector, one of the options in arrays.

But I think it’s important it’s one of the build-in types, with literal notation, pattern matching, etc.

Clojure’s vector is literally the same thing, they’re both HAMTs. “ordered” is entirely a function of the data structure API, not the memory layout. If you have to hop around in memory at all, then they aren’t “arrays” in the C sense. I’ll grant that the API of a Map is missing some common array operations but in terms of the underlying mechanic, map[i] does the same fundamental thing as (get [1,2,3] 1).

1 Like

I think I may have sent you down the wrong track by listing a couple of the properties of Clojure’s vectors I think are useful, but I really meant “properties similar to Clojure’s vector in more than just access to items by index in log32N hops and optimized append.”

I appreciate what you’re saying about about a map providing log32 access, and maybe you’re trying to tell me something about implementation of Erlang’s map that I am not understanding, but the implementation of Clojure’s vectors isn’t just an interface around a map.

Of the libraries I linked persistent_vector and steady_vector are built with tuples internally. Only the map array in arrays is implemented using a map. Which is to say, you can build an interface around a map to get something similar to Clojure’s persistent vector, but it’s likely to have different characteristics.

All of which is ignoring my primary point, which is that I think it’s important such a structure be built into the platform (Erlang) so it’s widely available, with literal and pattern matching support, etc. This, like the addition of maps, would give the whole platform an important, universal tool to use in situations where it makes sense, rather than needing to work around the limitations in the provided collections.

1 Like

I agree with everything you’re saying even though I have 0 expectation that we’ll ever see vectors added to the language.

Also saying you can use a map with integer keys to replace vectors is equivalent to saying you don’t need map and filter because you have for loops.

2 Likes

I’m getting on a long flight in a few hours - I may draft up a quick NIFS library with bindings to a mutable vector.

Certainly not “elixir-ey” but I have personally come across a few cases where I needed O(1) access and O(1) insert in my elixir programs

Eh, but :array doesn’t make big tuples, for example it will make a tree of tuples of size 10 each as I recall, it’s actually very efficient.

I’m curious what was the issue with :array missing a feature? What feature?

Matrex is an awesome BLAS library, if you need speed it’s well worth using.

The first class array type essentially is the Tuple, remember this is an immutable language because it is massively distributed.

The JIT makes them real array’s though, but until the JIT runs they are quite slow (and if certain access patterns are hit they ‘fall back’ to maps).

Yeah that absolutely does not sound like you should be using an array at all…

O(1) really? For both access and insert in an immutable language? Doesn’t sound possible as the very act of ‘inserting’ would mean duplicating the data (O(N)) or pointing to the old data (O(>1), like via a rope structure, which them makes access not O(1) either). I’m unsure how you’d implement such a structure in pure immutability?

1 Like

And @mischov this isn’t even easy in mutable languages. O(1) append and prepend in mutable languages works by amortizing resize operation costs over N append or prepend operations. O(1) random inserts (as distinct from overwriting the value at position i) usually requires a more specialized data structure. You can’t do the same amortization trick with arbitrary inserts because you don’t know where you can put buffer space. Traditional arrays are actually O(n) for random inserts or deletions, so using the array’s module or a map is actually an improvement in that sense.

3 Likes

Did you mean to reply to @LukeWood? I haven’t suggested O(1) append and prepend.

Anyhow, if one were trying to build a persistent vector a good approach would probably be Bagwell’s RRB-Tree vector.

this structure allows immutable vector concatenation, insert-at and splits in O(logN) time while maintaining the index, update and iteration speeds of the original vector data structure.

2 Likes