TL;DR: are tuples much faster to construct and access than maps?
I am working on a project where I am passing elements through GenStage and processing them in each stage. I am implementing a data processing system (the Dataflow / Beam model of processing) on top of GenStage. Therefore, as well as actual data/values, I have a fair amount of metadata riding along in these events.
Currently, I am using a 4-tuple to represent these events. This came by as initially it was a two-tuple, then three, then four, etc. I probably would have chosen maps initially had I known, but now I ask the question—will maps slow me down in the case where I’m pattern matching / constructing tens of thousands of them?
I don’t actually know where I get this impression from, but I thought that since maps are relatively new in Erlang, they may be slower than tuples. Now I realised that perhaps it shouldn’t be a concern.
Does anyone have any experience with this / are there any benchmarks or results out there to support or refute this?
@mjadczak: If you can’t say how many items your tuple will have in future then I suggest you to combine struct and tuple like:
defmodule MyApp.Event do
defstruct [:your, :fields, :here]
end
defmodule MyApp.Example do
alias MyApp.Event
def sample do
{:ok, %Event{your: "", fields: 0, here: true)
end
end
I’m not sure about performance, but when you are building API and changing it from 2 tuple items (initial version) to 4 and more tuple items (in next versions) then it’s easier for someone that uses your API to have always 2 tuple items with map or struct as a second tuple item.
In your case:
alias MyApp.Example
{:ok, your, fields, _here} = Example.sample
# then in next versions:
{:ok, _here, your, fields, _something, _more} = Example.sample
do_something_with(your, fields)
As I suggest:
alias MyApp.Example
{:ok, event} = Example.sample
# then in next versions:
{:ok, event} = Example.sample
do_something_with(event.your, event.fields)
Thanks, just to clarify, these tuples aren’t actually exposed to users, they’re just to contain all of the information internally. The purpose of moving to maps would just be to aid further development for myself, and make it easier to refactor in the future.
Tuples are implemented under the covers as a C array of pointers to Terms, thus tuples are fast for access if you know the element number, but are slow if you need to modify them.
If you always prepend to a List, they are fast and memory efficient to modify, but access to specific elements is slow.
Maps are somewhere in between, access to a specific element is much faster than a List, but modification is more expensive than a List, but less than a Tuple.
Without knowing the exact access patterns and transformations, it’s hard to know what is best for your case. For “small” values it’s extremely hard to make any general statements about performance of various data types. You really need to benchmark the code to know.
This is what I try to do with Elixir code:
“Make it right, Make it pretty, Make it fast ( if you need to)”
I’d pick the data type that makes working on the code as easy as possible.
For note, if you want a struct-like thing that is built for reading speed, that is precisely what Records are, Records are just tuples with named slots.
Altering a map can be faster than a tuples if you have more that a couple elements though (entire tuple has to be recreated, map can be altered faster), but for few elements even a tuple will always be faster.
But yeah in general do not worry about performance unless you find a specific need.