Hi guys. I find myself in need of array-like data structure like the ones you find in Ruby, Python or Javascript.
I’m writing a genetic algorithm and when you deal with this kind of algorithms, you are constantly randomly selecting a position in it, slicing, appending, shuffling and combining. And using Lists for it just add time complexity given that for all of this operation the cost is almost always O(n).
I’ve tried using erlang’s :array but this is missing some of the feature above. Right now I’m workingaround the issue by using Maps with numerical keys, but again there are time when even this doesn’t cut it.
Is there a library I could use or maybe I need to create my own. Maybe wrapping Erlang’s :array or Elixir’s map to support all of my use cases?
write your own module that will use tuples to store data
write functions you need for the :array
use NIF or port
It depends on your use case which one of these will be most efficient. In general there is no simple and efficient way to shuffle any structure in Elixir.
:array module is built upon tuples. And the same shortcomings are when you want to use maps. So if you are doing many updates to the array then you need to use:
Process dictionary, but this will limit you to single process
ETS, but that requires careful editing and managing of the state, also you need to have “parent” process for the table
Keep in mind that the data structure you are proposing is ill suited for distributed systems, which is why it’s not a first class citizen of the BEAM. I’m a math major working at a deep learning company, and maybe it’s Stockholm syndrome but I’m ok with this separation of concerns and having arrays be second class citizens of the ecosystem. Distribution is hard, and the trade-off of making arrays second class is, in my mind “totally worth it” to make distribution easy.
Minor nitpick - Javascript implements “normal” arrays (like you’d get from parsing the JSON string [1,2,3]) as maps under the hood. For instance, you can call [1,2,3].keys()…
IMO this sounds like you need a more-specialized data structure than arrays - even Ruby arrays are going to have poor performance characteristics with things like inserting in the middle that can’t share structure.
For instance, a common solution you’ll find in a lot of text editors is the rope. It costs more complexity up-front, but it comes in very handy when (for instance) a user types one character at the start of a 10MB text file.
In general, immutability seems to require a little more thought when picking structures - for instance, a performant double-ended queue is more complicated without mutation so there’s the :queue module in stdlib. The comments in that module recommend reading “Purely Functional Data Structures” by C. Okasaki and I’ll +1 that recommendation.
I think that a build-in (to Erlang) version of a persistent data structure with properties similar Clojure’s vector (access to items by index in log32N hops, optimized for append rather than list’s prepend) would be very beneficial while not being any less suited to Erlang than a map.
You basically already have this with a map where the keys are indices and the values are the value. %{0 => "first item", 1 => "second item"} is O(log32(n)). I swear I’ve seen a hex package that implements an array like UI around this.
Clojure’s vector is literally the same thing, they’re both HAMTs. “ordered” is entirely a function of the data structure API, not the memory layout. If you have to hop around in memory at all, then they aren’t “arrays” in the C sense. I’ll grant that the API of a Map is missing some common array operations but in terms of the underlying mechanic, map[i] does the same fundamental thing as (get [1,2,3] 1).
I think I may have sent you down the wrong track by listing a couple of the properties of Clojure’s vectors I think are useful, but I really meant “properties similar to Clojure’s vector in more than just access to items by index in log32N hops and optimized append.”
I appreciate what you’re saying about about a map providing log32 access, and maybe you’re trying to tell me something about implementation of Erlang’s map that I am not understanding, but the implementation of Clojure’s vectors isn’t just an interface around a map.
Of the libraries I linked persistent_vector and steady_vector are built with tuples internally. Only the map array in arrays is implemented using a map. Which is to say, you can build an interface around a map to get something similar to Clojure’s persistent vector, but it’s likely to have different characteristics.
All of which is ignoring my primary point, which is that I think it’s important such a structure be built into the platform (Erlang) so it’s widely available, with literal and pattern matching support, etc. This, like the addition of maps, would give the whole platform an important, universal tool to use in situations where it makes sense, rather than needing to work around the limitations in the provided collections.
Eh, but :array doesn’t make big tuples, for example it will make a tree of tuples of size 10 each as I recall, it’s actually very efficient.
I’m curious what was the issue with :array missing a feature? What feature?
Matrex is an awesome BLAS library, if you need speed it’s well worth using.
The first class array type essentially is the Tuple, remember this is an immutable language because it is massively distributed.
The JIT makes them real array’s though, but until the JIT runs they are quite slow (and if certain access patterns are hit they ‘fall back’ to maps).
Yeah that absolutely does not sound like you should be using an array at all…
O(1) really? For both access and insert in an immutable language? Doesn’t sound possible as the very act of ‘inserting’ would mean duplicating the data (O(N)) or pointing to the old data (O(>1), like via a rope structure, which them makes access not O(1) either). I’m unsure how you’d implement such a structure in pure immutability?
And @mischov this isn’t even easy in mutable languages. O(1) append and prepend in mutable languages works by amortizing resize operation costs over N append or prepend operations. O(1) random inserts (as distinct from overwriting the value at position i) usually requires a more specialized data structure. You can’t do the same amortization trick with arbitrary inserts because you don’t know where you can put buffer space. Traditional arrays are actually O(n) for random inserts or deletions, so using the array’s module or a map is actually an improvement in that sense.
Did you mean to reply to @LukeWood? I haven’t suggested O(1) append and prepend.
Anyhow, if one were trying to build a persistent vector a good approach would probably be Bagwell’s RRB-Tree vector.
this structure allows immutable vector concatenation, insert-at and splits in O(logN) time while maintaining the index, update and iteration speeds of the original vector data structure.