Performance of tuples over maps

mjadczak · April 12, 2017, 3:17pm

TL;DR: are tuples much faster to construct and access than maps?

I am working on a project where I am passing elements through GenStage and processing them in each stage. I am implementing a data processing system (the Dataflow / Beam model of processing) on top of GenStage. Therefore, as well as actual data/values, I have a fair amount of metadata riding along in these events.

Currently, I am using a 4-tuple to represent these events. This came by as initially it was a two-tuple, then three, then four, etc. I probably would have chosen maps initially had I known, but now I ask the question—will maps slow me down in the case where I’m pattern matching / constructing tens of thousands of them?

I don’t actually know where I get this impression from, but I thought that since maps are relatively new in Erlang, they may be slower than tuples. Now I realised that perhaps it shouldn’t be a concern.

Does anyone have any experience with this / are there any benchmarks or results out there to support or refute this?

Eiji · April 12, 2017, 3:40pm

@mjadczak: If you can’t say how many items your tuple will have in future then I suggest you to combine struct and tuple like:

defmodule MyApp.Event do
  defstruct [:your, :fields, :here]
end
defmodule MyApp.Example do
  alias MyApp.Event

  def sample do
    {:ok, %Event{your: "", fields: 0, here: true)
  end
end

I’m not sure about performance, but when you are building API and changing it from 2 tuple items (initial version) to 4 and more tuple items (in next versions) then it’s easier for someone that uses your API to have always 2 tuple items with map or struct as a second tuple item.

In your case:

alias MyApp.Example

{:ok, your, fields, _here} = Example.sample
# then in next versions:
{:ok, _here, your, fields, _something, _more} = Example.sample

do_something_with(your, fields)

As I suggest:

alias MyApp.Example

{:ok, event} = Example.sample
# then in next versions:
{:ok, event} = Example.sample

do_something_with(event.your, event.fields)

mjadczak · April 12, 2017, 3:42pm

Thanks, just to clarify, these tuples aren’t actually exposed to users, they’re just to contain all of the information internally. The purpose of moving to maps would just be to aid further development for myself, and make it easier to refactor in the future.

xlphs · April 12, 2017, 3:43pm

They are equally difficult to construct, and tuple should be faster to access than map. I refer to how BEAM implements tuple:

tuple is very easy to access at arbitrary index, and very hard to grow

from http://beam-wisdoms.clau.se/en/latest/indepth-memory-layout.html#tuple-arityval-0

The reason map is slower to access is because of arbitrary key type.

Also, I would argue that unless your map is going to contain thousands of elements there is no point in worrying about performance.

bbense · April 12, 2017, 3:53pm

FWIW, Maps of less than X elements ( currently X is 32 ) are just sorted Lists of Key Value tuples, beyond that they are Hash Mapped Trie’s. https://en.wikipedia.org/wiki/Hash_array_mapped_trie

Tuples are implemented under the covers as a C array of pointers to Terms, thus tuples are fast for access if you know the element number, but are slow if you need to modify them.

If you always prepend to a List, they are fast and memory efficient to modify, but access to specific elements is slow.

Maps are somewhere in between, access to a specific element is much faster than a List, but modification is more expensive than a List, but less than a Tuple.

Without knowing the exact access patterns and transformations, it’s hard to know what is best for your case. For “small” values it’s extremely hard to make any general statements about performance of various data types. You really need to benchmark the code to know.

This is what I try to do with Elixir code:

“Make it right, Make it pretty, Make it fast ( if you need to)”

I’d pick the data type that makes working on the code as easy as possible.

OvermindDL1 · April 12, 2017, 5:08pm

For note, if you want a struct-like thing that is built for reading speed, that is precisely what Records are, Records are just tuples with named slots.

Altering a map can be faster than a tuples if you have more that a couple elements though (entire tuple has to be recreated, map can be altered faster), but for few elements even a tuple will always be faster.

But yeah in general do not worry about performance unless you find a specific need.