What is a good general purpose way to get O(1) access and set operations in Elixir?
Is there a way to specify that a Map is in a “build state” then transition it to an access state so that a flat hash map structure could be used to allow O(1) access/O(n) building?
I understand the reason that the raw elixir maps cannot be both O(1) access and insert while being immutable, but in this sort of structure it seems something like this would be possible:
builder = FastMap.Builder.new()
builder = FastMap.Builder.put(:key, :val)
map = FastMap.build(builder)
# now after building we can access
:val = FastMap.get(map, :key)
# match successful
Does anything like this exist? If not then maybe I should take a stab at implementing it!
It seems like at some point in the development of Elixir as a language a general purpose O(1) set/access data structure would be essential!
If anything similar exists I’d love to be pointed that direction, if this is a common gap in what already exists in Elixir seems like something I should implement!
I assume this is a question of “this might be useful” rather than “I need this”, because the answers you get will differ a lot based on which one it is.
The straightforward way to get a key-value O(1) read and write “data structure” in Erlang/Elixir is ETS. It does what you want. You could alternatively do a NIF for your FastMap, but you’d basically just re-implement ETS.
Discord did write an article about implementing their own data structure for Elixir in Rust, because nothing that existed fit their use case (the article doesn’t mention ETS, but the author later said that was a mistake because they did try it but it doesn’t have the right operations for what they wanted).
What you describe seems to be similar to Clojure’s “transient” maps: immutable maps can be transiently mutable while they are built, and then turned into immutable ones in constant time.
I implemented something similar for the Crystal language: https://github.com/lucaong/immutable
Transient maps are NOT standard hash-maps though: hash-maps would be slow to turn into an immutable map, as you would have to traverse all entries, essentially defeating the purpose. They are still implemented as bit-array map tries, but they change the tree in a mutable way, so they are faster than immutable maps, mostly because they do not stress the garbage collector as much when many values are inserted in bulk, but they are not O(1).
While transient collections would be nice, I honestly dispute the utility of “true O(1)” collections:
ETS is a good solution for most use-cases, as @jola noted. In the limited cases when even more fine-tuned performance is needed, I personally doubt that Elixir is the right choice.
Immutability has profound beneficial implications for the language, and generic mutable data structures would invalidate most of them
I honestly don’t know the actual implementation of large maps on the Erlang VM, but bit-array tries like in Clojure have usually a fan-out of 32, so access and update is O(log 32). That’s practically O(1), as in practical cases the number of steps is bounded by a small constant (with 1 billion elements you need no more than 7 steps, and at that point memory is a bigger problem than CPU).
O(1) hash maps would have amortized O(1) updates, but they would still need re-hashing, so they have different trade-offs and aren’t unequivocally better
That said, I would still welcome attempts to improve that and I like the enthusiasm: it’s always exciting to discover new approaches, and occasionally some breakthrough data structures I just don’t think this is an “essential” need for Elixir/Erlang.
Is ETS O(1) access/set? Is there a reference for that in the docs that I could get a link to?
It is mentioned in the documentation that I linked
These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data.
Insert and lookup times in tables of type set, bag, and duplicate_bag are constant, regardless of the table size.
Ah awesome, I didn’t realize ETS had those attributes.
Thanks for the resources referenced: closure’s transient data structures are super cool! Big fan!
I think ETS is probably good for the tasks that sparked this line of thought, but I also think it could be cool to have something that can be used in a more “elixiry” way with:
- O(1) insert during building
- O(1) access after build
So I’m going to take a stab at it!
Is O(1) that much better than O(log32)? Probably not but it could be a cool exercise
Thanks again for the resources referenced I’ve done a bit of reading and it’s pretty cool!
They are the same. Not nearly or similar. They are strictly the same.
Ah I just did some reading on this, log32(1trillion)=8, good point. That’s a very irrelevant factor. Thanks for pointing this out.
O(log32(n)) then, while I understood the original as log of 32