Is operating on [{1}, {2}, ...] meaningfully less performant than [{:a, :b, ...}, ...]

Knowing that Elixir lists are linked lists, and that updating them has some performance cost for traversal, etc, does the data in those lists impact performance?

Eg, given a “simple” list of [{1}, {2}, ...] where the elements are “light” in byte count and [%Struct{b: "abcd"}, %Struct{}, ...] where the elements are “heavy”, is the performance between these lists when traversing, reversing, inserting, etc different enough to matter (over n elements where n is “big”)?

Does elixir copy the values for everything (so more bytes == more work), or is it all pointer shuffling and its just the size of the list that matters?

Pretty much this. Terms within the VM are represented as tagged memory words (say, 64-bit unsigned integers), with some of the bits assigned to a tag and the remaining assigned to content. The tag contains type-related information. There are two kinds of terms:

  • “immediate” terms where both the tag and value can fit within the memory word: small integers, atoms, that kind of thing
  • “indirect” terms, where the content of the memory word is a pointer to where the actual value resides: larger integers, floats, binaries, composite terms other than empty lists

Composite terms like lists, maps and tuples are made up of other terms: the terms they contain are handled according to their nature as tagged memory words, which are all of the same size - therefore, you’re shuffling either small values or pointers, and thus it doesn’t make a difference.

(I may have simplified a bit / butchered some of the terminology)

4 Likes

I should mention, however, that copying terms to other processes, ETS tables, etc usually implies deep copying (with rare exceptions), and in that case it will absolutely make a difference.

3 Likes