Big-O Time Complexities of Elixir Data Structures

PJUllrich · November 15, 2021, 11:11am

I am currently trying to compile a Big-O Time Complexity table for the Elixir data structures, but can’t find information for all complexities and am also not 100% sure that I’m getting them right. My goal is to have a comprehensive table like the Big-O Algorithm Complexity Table and maybe put it into a nice cheatsheet poster for future reference. I was able to find bits and pieces in the docs, Elixirforum posts and the Elixir mailing list, but I failed to find a comprehensive overview of all the complexities.

Would you mind helping me to complete and review the following table, please?

# Big-O Time Complexities for Elixir data structures

# Map [1]
Access:    O(log n)
Search:    O(log n)
Insertion: O(n) for < 32 elements, O(log n) for >= 32 elements [2]
Deletion:  O(n) for < 32 elements, O(log n) for >= 32 elements [2]
Caveats:
- 'With fewer than 32 elements, a Map is a list of tuples sorted by keys'
- 'With 32 and more elements, a Map is a Hash Array Mapped Trie (HAMT)'
- 'Maps are unordered, allow any key type, but disallow duplicate keys'

# List [3]
Access:    O(n)
Search:    O(n)
Insertion: O(1) for prepending, otherwise O(n)
Deletion:  O(1) if first element, otherwise O(n)
Caveats:
- 'Lists are not arrays as in other languages, but single-linked lists'

# Keyword [4]
Access:    O(n)
Search:    O(n)
Insertion: O(n)
Deletion:  O(n)

# Tuple [5]
Access:    O(1)
Search:    O(n)
Insertion: O(n)
Deletion:  O(n)
Caveats: 
- 'Tuples are better for reading, whereas Lists are better for traversals'
- 'Avoid using tuples as a collection'
  '(i.e. avoid Tuple.append/2, Tuple.insert_at/2, or Tuple.delete_at/2)'

# (erlang) array [6]
Access:    O(log n) [7]
Search:    O(log n)
Insertion: O(log n)
Deletion:  O(log n)
Caveats:
- 'An array is a trie of small tuples, with a smaller memory overhead to Lists'

# MapSet [8]
Same complexities as 'Map'

Discussion points

The time complexities are a mix of average, best, and worst complexities. Should we separate them into average, worst and best columns maybe?
Did I forget a data structure? I excluded the Arrays library, because it’s a library that uses different data structures under the hood.
Should we include ets as well? It has O(1) for access and insertion for set, bag, and duplicate_bag tables, but isn’t technically a data-structure

References

Other resources
Partial overview of complexities
Discussion of :array
Does Elixir have persistent data structures?
Way to get O(1) access/set
Sequences by @sasajuric

sabiwara · November 15, 2021, 12:06pm

Oh this is a nice initiative
Although it doesn’t mention the complexities explicitly, this article comparing sequences might be an interesting resource to mention as well.

This one would actually be O(log n), since erlang’s :arrays are tries (like maps), not actual arrays in the traditional imperative programming sense.

hauleth · November 15, 2021, 12:28pm

Maps:

Access is O(log n) amortised - always. If there is less than 32 keys, then these are stored as sorted tuples, which mean, than you can use binary search.

Insertion/deletion is O(n) or O(log n) (amortised) respectively for less than 32 and more than 32 keys.

List:

In the middle is still O(n)

Deletion O(1) if first element (just do tl(list) which is constant) O(n) otherwise

Keyword list:

The same as regular list (as these are the same things) unless you use Keyword module, which will do deduplication in most cases. Then indeed it will be O(n).

Array:

It is trie of tuples, so your Big-Os are wrong there.

PJUllrich · November 15, 2021, 1:12pm

Thank you very much for your input!

I updated the Maps and List complexities with your info.
I also found this very interesting email which lists the complexities of the erlang data structures. I added the O(log n) complexity to the :array section accordingly.
I also removed the List in Keyword List to clarify that I mean Elixir’s Keyword structure.

sabiwara · November 15, 2021, 1:51pm

I’m not so sure about deletion for :array, since there isn’t really such an operation.

You could either use reset/2 to set the value back to the default value (e.g. :undefined), which would be O(log n)… but you would end up with a sparse array on which you cannot really use random index anymore, e.g. [1, 2, :undefined, 4] instead of [1, 2, 4].

Or you could create a new array with everything except the deleted element, and this would be O(n).

PJUllrich · November 15, 2021, 5:52pm

That’s a very good point! Unfortunately, I don’t fully understand what you mean with you cannot use random index anymore (on a sparse array). Would that hurt the time complexity of subsequent actions? Or is this something one has to watch out for during implementation? In that case, I’d still note down O(log n) and put this under caveats. What do you think?

sabiwara · November 15, 2021, 11:37pm

Oh sorry, I wasn’t really clear. Strictly speaking you can still use random access in a sparse index, but you would need to account for the “holes”, since the positions and size won’t be shifted. This will impact the rest of your algorithm if you cannot assume a dense array anymore.

In pseudo-code:

> dense_array = delete_at([1, 2, 3, 4], 2)
[1, 2, 4]
> sparse_array = reset_at([1, 2, 3, 4], 2)
[1, 2, :undefined, 4]
> dense_array[2]
4
> sparse_array[2]
:undefined
> sparse_array[3]
4

The sparse version is just really a replace_at, not an actual deletion, so it is a bit like comparing apples and oranges. But depending on your algorithm, it is probably the best option in practice when possible, and maybe convert it back to a dense array once you’re done deleting using sparse_to_list/1 (O(n)).

ityonemo · November 16, 2021, 12:30am

I highly recommend adding ets. Digraph is another batteries-included data structure, as is queue. Also counters.

PJUllrich · November 16, 2021, 8:24am

Thank you very much for this clarification. I agree with you that this is a caveat and should be mentioned in the table. Unfortunately, it seems that you can only edit your posts 24h after publishing them So, I’m unable to adjust the table any more.

@LostKobrakai sorry for the ping. You’re the only moderator that I know. Would it be possible to let me edit my original post a bit longer maybe? I would like to edit the table with the complexities with all recommendations so that we have a single comprehensive source of truth eventually.

hauleth · November 16, 2021, 8:56am

Maybe move it to Gist or another externally managed resource, where you will not be limited by the forum editing constraints.

PJUllrich · November 16, 2021, 2:08pm

Thank you for the suggestion @hauleth. I moved the table to a GitHub Gist. The gist deviates only slightly from above table (I added a caveat to array as @sabiwara suggested). I’m happy that above table was quite comprehensive already. I hope it will serve as a good documentation for future reference

GitHub gist: Big-O Time Complexities for Elixir and erlang Data Structures · GitHub