Tuple vs List


#1

Hi,

there’s some info out there about when to use lists vs tuples, and I get that, but nothing I’ve found talks about the efficiency of one vs the other in relation pattern matching.

I’m using card games (specifically bridge) for my explorations of Elixir, and at one point I had a hand as a list of lists, and then I thought, well, there are never going to be more than four suits, so perhaps that should be a tuple of lists, so I changed to that.

I now get to the point where I want the hand as a flat list, and thought, maybe I should have stayed with list of lists, so I created a version doing that.

The core of the difference is in this :-

Tuple version
def split(card = {0,}, {cl,di,he,sp}), do: {slot(card, cl),di,he,sp}
def split(card = {1,
}, {cl,di,he,sp}), do: {cl,slot(card, di),he,sp}
def split(card = {2,}, {cl,di,he,sp}), do: {cl,di,slot(card, he),sp}
def split(card = {3,
}, {cl,di,he,sp}), do: {cl,di,he,slot(card, sp)}

List Version
def split(card = {0,}, [cl,di,he,sp]), do: [slot(card, cl),di,he,sp]
def split(card = {1,
}, [cl,di,he,sp]), do: [cl,slot(card, di),he,sp]
def split(card = {2,}, [cl,di,he,sp]), do: [cl,di,slot(card, he),sp]
def split(card = {3,
}, [cl,di,he,sp]), do: [cl,di,he,slot(card, sp)]

From my testing, the tuple version is about 2% more efficient.
Average of 4 runs of generating 1_000_000 hands
Tuple = 47.851
List = 48.906

t sort of makes sense without knowing why … but, can anyone explain specifically why the tuple version is quicker?

Jonathan.

PS
I do remember thinking it was pretty cool to be able to pattern match INTO a list beyond head|tail when I discovered you could.


#2

I think the simple reason is that tuples and lists are different data structures generally used for different purposes. Tuples are typically used for fixed sized data or when you want fast lookups, lists for dynamically sized structure or if you want to iterate over the elements in a natural way.

Tuples have O(1) lookup whereas lists are O(N)

I haven’t look into the exact implementation of the data types so there may be other factors in terms of the actual pattern matching.

In terms of how to model card games? I’d probably lean towards using a list of {suit, value} tuples. You want to be able to remove cards from the deck, usually from the bigging.

To shuffle a deck of cards as as simple as:
Enum.take_random(deck_of_cards, 52)


#3

Tuples are arrays, while lists are linked lists (so there’s more pointers to follow). In your case the list is tiny, and the elements probably end up contiguous in memory, so the performance difference is negligible.


#4

What @cmkarlsson says is correct. Not only do Tuples have O(1) lookup and lists O(N), replacing an element of a tuple means that the whole tuple needs to replaced, whereas replacing the head of a list means that the tail can be kept as-is.

This basically means:

  • Looking for a container with a fixed amount of data elements? Use tuples (or structs!).
  • Looking for a container with a varying or large amount of data elements? Use lists.

In your case, I would represent a card as a struct, and the deck as a whole (as well as the hands of the various players) as a list.

Finding if a player has a certain pattern in their hand then means: Sorting the list (which you cannot do with a tuple), and then pattern matching with known card patterns.


#5

You might be interested in watching Data efficiency on the BEAM by Dmytro Lytovchenko/Erlang Solutions and reading through The BEAM Book and BEAM Wisdoms to understand how data is represented inside beam.

I think for two elements a and b, [a | b] is more efficient than {a, b}, in any other case (more elemetns, but not too many) tuples are probably more efficient.


#6

I think it is also important to mention this:

Premature optimization is the root of all evil

~Donald Knuth

Using tuples (or structs) for a specific card, and lists for a deck of cards are the idiomatic choices of data structures for these problems. Do not care about the 2% speed increase, because writing non-idomatic code for efficiency’s sake will severely hamper the development velocity of your project as time goes on, which is far more important.


#7

I agree with you, but to find those 3% we should know where to look.


#8

Idiomatically, if the size is static use tuple, if the size is dynamic use list. In your example size is static so tuple would be proper (or even just as plain 5-arity).


#9

While this should be obvious I’m just going state it anyway:

  • In a list there usually is the expectation that all elements are of the same type or at least a sufficiently similar type. Though sometimes lists contain ofter lists that are expected to be flattened. And if you are storing different types in the same list, the “type” contained by the list is implicitly a type union.
  • Tuples can have elements of different types. Tuples where the same types appear in the same position can be considered to be a member of the same product type.

#10

I’m with Idiot. Oh. What? :grinning:

I’ll have a look at those erlang links, thanks.

I get the tuple/list usage bit … it just surprised me that the pattern matching a tuple of four lists was 2% more efficient than a list of four lists.

As to the early optimization thing … if you don’t optimize your data structures early, you probably never will.

Thanks


#11

Less pointers to follow internally. ^.^


#12

And in most cases that will be the correct thing to (not) do.


#13

I believe the premature optimization quote is misinterpreted and often used to justify badly written code in critical parts of your software.

If it is in the critical part of your application, of course you should pick the best enough algorithm, data-structure for your problem. This is not “optimization”, this is sound computer engineering.

Just don’t spend “too much time worrying about efficiency in the wrong places and at the wrong times” [https://en.wikiquote.org/wiki/Donald_Knuth]. The key here is spending time optimizing non-critical parts of your application.

In general I believe things in the large should be more optimised. As we the rest of modern society computer engineers are really bad at including externalities. Hence it is easy to optimize for “developer speed” when you don’t have to care about your users resources in the same way it is cheap to do manufacturing if you leave the clean-up job to the next generation. (Yes, I am talking about electron, or modern web with all the tracking and javascript and what not). People justify it with: “if it wasn’t for [insert wasteful front-end framework] the customer wouldn’t get a product at all!”. Apply this thought to the earth’s resources and you see how absurd it is.

On the other hand, premature optimization probably works the other way around as well. Hence it doesn’t mean to much if you use 20W lamps instead of 30W when the paper mill down the road doesn’t care.

Well. That took us quite far from the tuple vs list discussion and I don’t think this post got us closer to an answer :smiley:


#14

This will sound cynical/bitter, but people spend all this time worrying about their datatype and their sizes, then they happily send/receive them JSON encoded for every request (of which they get/send tons). Sorry for continuing the off topic discussion.


#15

Which sort of emphasizes the quote by Donald. We worry about efficiency in the wrong places and the wrong times :smiley:


#16

Tuples are like structs in C, whereas lists are like singly-linked lists.