Why does programmer have to choose between lists/tuples and keyword lists/maps?

frantic · August 4, 2019, 5:48am

Loving Elixir so far!

Came across the docs on basic types: list and tuples, keyword lists and maps. I understand the performance differences, some operations are O(n), others are O(1).

But why does the programmer have to make this choice? Conceptually, I’d think the program logic should care about whenever something is a list of things, or a mapping from things to other things.

The language could statically (or at runtime) look at what operations are most commonly invoked and pick the most optimal implementation. Values in Elixir are immutable, so seems like it shouldn’t be impossible to do so.

I’m sorry if the question sounds dumb I understand that Erlang interop is a big deal, but Elixir got lots of cool modern language features, why don’t use this opportunity to simplify core data structures list a little bit?

thojanssens1 · August 4, 2019, 7:25am

I’m just a beginner but it seems you only look at the performance aspect. What about readability? If you start using lists everywhere without using tuples, because let’s say the compiler would transform stuff as you said for you, I will have a hard time understanding your code/the purpose of your structures? Because each structure has its own usage.

E.g. list is used for undetermined amount of elements, tuples for determined amount. Keyword lists for options, configuration, … Map for structured data. Etc

kokolegorille · August 4, 2019, 7:50am

Those are typical data structure You may find in many functional languages… They are not the same, they apply best in different situation.

Because You don’t have object, You compose complex structure from primitive. You can create new type.

For example, You often use a tuple like {:ok, whatever} for response.
or list for recursion
or map, for structured data
or tuple, for fixed size record
Usually You can use map, as they are the most used in Elixir.

Also, I think in terms of types much more often than with dynamic language, like Ruby.

If You are coming from dynamic language, that might be strange.

jola · August 4, 2019, 8:58am

Ignoring the data structure part, what you’re suggesting is very similar to a JIT (just in time) compiler, like say V8, which will optimize your code as it learns about it at runtime. JIT is something that’s being discussed as a future option for the BEAM, I don’t know if any project towards it has been started though.

But your question is specifically for data structures. As an example of a data structure that does change as it’s being used is the Elixir/Erlang map, which for smaller sizes uses a more efficient underlying representation (tuple of lists, not the same but similar to a keyword list), but as it grows larger than a certain size (32, although that can change) it is automatically converted to a HAMT.

But part of the value in letting the developer choose their own data structures is that you’re conveying information to the runtime (I want this behavior) and you’re in turn giving guarantees to the developer (your code will behave the way you expect). The fact that Elixir maps change their behavior at runtime actually causes confusion and even bugs! If you don’t pay attention to the documentation and just experiment with small maps, maps seem to have sorted keys. Calling Map.keys gives you the keys in sorted order. But if you grow it beyond 32 items that stops being true, they instead get a different order. If your code relied on the keys being sorted and you at some point have a map larger than 32 items, you’ve got a bug.

iex(1)> m = for key <- 1..32, into: %{}, do: {key, :val}
iex(2)> Map.keys(m)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]

iex(3)> m = for key <- 1..33, into: %{}, do: {key, :val}
iex(4)> Map.keys(m)                                 
[11, 26, 15, 20, 17, 25, 13, 8, 7, 1, 32, 3, 6, 2, 33, 10, 9, 19, 14, 5, 18, 31,
 22, 29, 21, 27, 24, 30, 23, 28, 16, 4, 12]

To clarify, I’m not saying that it’s a bad thing that Maps are optimized for smaller sizes. I’m saying this type of optimization does have tradeoffs.

yurko · August 4, 2019, 9:28am

I’d like to add that that Elixir actually simplified it’s data structure “park” already, as an example see Dict — Elixir v1.16.0 and HashDict — Elixir v1.16.0, it also does not provide API for some further Erlang data structures like Erlang -- queue or Erlang -- digraph.

So:

why don’t use this opportunity to simplify core data structures list a little bit

This is exactly what it does Here when you have to think in terms of “language level”, having a generic list for everything is the highest level of abstraction possible, it would simplify your code and probably give you reasonable performance for the most cases, but then for some more specific uses you will still have to learn about what it does under the hood and maybe adjust your code in a less explicit way to trick it to behave in a way you want it to.

With few more data structures this behavior is much more predictable (with a grain of salt as @jola mentioned) and it will not take much for you to learn their strengths and weaknesses, this knowledge will also translate to many other languages, so no matter if you keep using Elixir, it is a useful knowledge.

kip · August 4, 2019, 10:17am

In addition to the other comments, its also important to understand that these different structures have different characteristics:

A Keyword list is just a list where each list entry obeys a certain convention - that that each entry is a 2-tuple with the first element being an atom. But in every other sense it is just a List. So really there are just lists, tuples and maps
Its common to think that a Keyword List and a Map are very similar but:
- A Keyword list can have duplicate entries, a Map cannot
- A Map can have an arbitrary term as its key, a Keyword list is always an atom key
- Pattern matching on a Keyword list the matching is order dependent (as it is for any list); for a Map matching is order independent

I think these alone would make it difficult if not impossible to change underlying data structures “on the fly” since the semantics of each structure are not the same at a fundamental level.

Qqwy · August 13, 2019, 3:23pm

On top of what @kip has said, here is a snippet of information about Tuples vs Lists, from the Tuple module documentation:

So just as with Keyword Lists vs Maps, Tuples and Lists can do some of the same things (in which case the choice between them would boil down to performance, or would not matter at all), but many things are only supported by one of them. And this is why both of them exist, and why the language cannot ‘automatically’ decide which one to use for you.