When to use which data-structure?

Is there some good blog post or video from some influential Elixirist (like Jose Valim) about when to use Lists over Tuples and vice versa, and when to use Keyword Lists or Maps or Structs? Better if the blog post is new and is based on Elixir 1.2 or above, so that the Elixir Maps implantation is based on the new Erlang Maps implementaion with the deprecation of HashDict.

Thank You!

Edit
@josevalim Please read this thread from top to bottom, and comment!

3 Likes

I’m not aware of any blog posts, but I’m wondering why you are asking for lists vs tuples. Usually a list is a collection of similar things, you can have between 0 and many items. Tuples though are grouping different things to form a new thing.

4 Likes

… but I’m wondering why you are asking for lists vs tuples. Usually a list is a collection of similar things

And yet the very first example of a list here:

List – Elixir v1.5.2

is a set of dissimilar items. So the question seems reasonable :slight_smile:

1 Like

List versus tuples is easy though; generally your data has a fixed number of elements (tuple) or it doesn’t (list).

There isn’t a sound-byte on this page, but the elixir-lang tutorial does give good guidance on Keywords versus Maps. Keyword lists are ordered, and can have duplicates, but have linear performance. Maps are not ordered, cannot have duplicates and perform well with large numbers of elements. Keyword lists are used commonly for optional arguments and have a constructor syntax sugar for this reason, but one flaw in this purpose is that pattern matching the keys is impractical as you must match them in the correct order. An interesting workaround for this is employed by the Phoenix framework; when you call render in a controller the last argument is a keyword list so you can easily pass assigns. When you define render in a view, the keyword list is converted to a map so you can pattern match the keys.

Structs vs. maps I think is a little more nuanced. Structs have some requirements: fields are known up front, keys can be atoms, sensible defaults exist. But even if all those hold true you may still use a map if the scope / lifetime of the structure is very small, such as a map that is only passed between 2-3 functions. When the scope or lifetime of a structure is larger (used in multiple modules) it may make more sense to define a struct.

3 Likes

Learning FP with Elixir, I was also surprised with data structures. It turns out they are very familiar in all FP languages.

As a practical example, this is how I use them

  • List are linked list, and optimized for TCO. That is why You see often [head | tail]. They are not so good for asking the 5th element for example.

  • Tuples are used when position matters… And You often see them as response {:ok, blah} | {:error, reason}. They are connected with Erlang records #{}, or Mnesia. But in Elixir, Struct are replacing records most of the time. They should not be too big, and they are perfect for storing fixed shape data.

  • Keywords are almost everywhere, but under the hood, they are just List of 2 elements tuples. [a: 1, b: 2] == [{:a, 1}, {:b, 2}]. You see this when using def my_func, do: blah.
    They can have duplicate keys. Mostly, I use them when passing options to function, because it’s handy to manage multiple parameters. Like this … my_func(options \\ []) I could then call my_func(blah: 1, blih: 2) and use Keyword module on options

  • Maps are like key value. They are the go to structure in Elixir. They cannot have duplicate keys, nor the key order is preserved… I mostly use them as GenServer state

  • Structs are just like super maps, they only accept atom keys, so you can call it with dot notation. It would be like an object, but without methods… after all, it’s FP here. It’s also what You use when persisting with ecto.

I did not mention MapSet, and maybe I am missing other structs… but You get the idea, each data structure is special in a way.

15 Likes

List versus tuples is easy though; generally your data has a fixed number of elements (tuple) or it doesn’t (list).

And yet, while the Tuple doc also says

“… (are) mostly meant to be used as a fixed-size container”

3 of the 5 Tuple functions change the size of the tuple. Which
seems a bit contradictory :slight_smile:

1 Like

There is no contradiction. When you pattern match a tuple, you will only match on tuples for that size. Transforming a tuple to a different size is the equivalent of changing it to a different type. This would be like mapping your data from one struct to a different struct. The fact that the functions are there, doesn’t mean you use them to treat tuples as lists.

1 Like

In my 2.5 years of professionally writing elixir daily, I never used any function from the Tuple module besides Tuple.to_list/1 (and that mostly for some meta-programming or some nasty processing in the depths of ecto). You generally only pattern match on tuples and handle them that way.

10 Likes

There is no contradiction.

I disagree, given the context of the original question.

For someone new to Elixir looking for guidance on when to use what
data structure, saying “use this for fixed size collections” and “here’s
how you change the size” is – not contradictory? not even slightly
confusing?

And whether those functions are frequently used in practice by an
experienced developer is irrelevant; I’m talking about how the docs
help or hinder informing the OP’s question.

1 Like

That may depend on whether or not they have experience with other functional languages, or at least reading an introductory book for Elixir (all of which make this very clear). I can tell you with certainty that this is not surprising or iconsistent for someone experienced in Haskell or Scala (nor, I don’t think, in OCaml, F#, Clojure or Lisp). Tuples of different sizes (and with different member types) are different types. In Erlang and Elixir, they are also different types but perhaps its more useful to think of them as like being functions of different aritys. Sure, you can transform one type to another type, and Tuple gives you some functions for doing so. That doesn’t mean they are equivalent to or readily confused with linked lists.

1 Like

Elixir Best Practices: When to Use Structs, String-keyed Maps, and Atom-keyed Maps

1 Like

Link doesn’t load, certificate is invalid.

1 Like

It is not, indeed, but you can bypass the check, then the link loads, for reading some content I feel OK with that, I’d not submit anything via a form or so to them, also I deactivated JS for in-secure (meaning invalid, untrusted or expired certificates) HTTPS resources.

1 Like

A (slow) web archive link for the squeamish

4 Likes

Actually I cannot, my work has the policies setup so websites must be https and invalid certificates cannot be overridden. ^.^;

2 Likes

Oh, yeah, company networks can get in the way sometimes.

1 Like

Sorry about that, although it’s not my site. It’s worth bypassing for the read, maybe when you’re at home or using your phone or something.

1 Like

The use of correct Data Structure is depends upon requirement.

1 Like

Copy/pasted here for reference. I hope it doesn’t break any rule of ToS or copyright. If it does, feel free to remove the post or ask me to do it.

Elixir Best Practices: When to Use Structs, String-keyed Maps, and Atom-keyed Maps

Feb 2, 2016 • Pete Gamache

Elixir, much like its foundation language Erlang and its syntactic muse Ruby, makes a distinction between character strings , like "hi_mom" , and atoms , like :hi_mom .

Elixir also offers several ways to pass around key-value data. There’s Map, whose keys can be either keys or atoms; Keyword lists, which are association lists (or alists ) consisting of {atom, any_value} tuples; and there are association lists of {string, any_value} tuples, such as the HTTP headers returned by Plug.Conn.req_headers.

Elixir structs are represented as Maps with atom keys, additionally restricting the user to a predefined set of keys, and allowing a shorter syntax for field access than non-struct Maps.

Maps (and structs) and keyword lists implement the Elixir Dict behaviour, whereas the string-keyed alist does not.

Given that these different data structures are not compatible, the programmer must choose which to use at any given place in an application’s code. But how? This is the source of frustration and confusion in noobs and greybeards alike.

In this post, we explore the right times to use each.

About Atoms and Strings

Atoms (called “symbols” in Ruby and Lisp, among others) are much like integers; they are constants where their name is their own value. Importantly, atoms are not garbage-collected , so allocating them at runtime can lead to RAM exhaustion over time.

Atoms are also the “preferred” keys for maps and dicts; this is to say, the Elixir language provides syntactic sugar to encourage their use. Writing %{a: 1} is shorter than the equivalent %{:a => 1} and shorter still than the string-keyed version %{"a" => 1} .

Strings are the same character strings you find in most languages: the same string can be allocated multiple times, and strings are GCed alongside other data. There is no penalty to using strings as map or dict keys, but there is a slight friction to doing so, so many programmers tend to use symbols as map keys as a default instead.

Rule #1: Always Use String-Keyed Maps for External Data

Data which comes from outside your application, broadly speaking, can’t be trusted. Given that atom allocation can lead to memory exhaustion in a long-running Erlang system, using atoms for external data opens up your app for a potential denial of service (DoS) attack.

This fact is reflected in many Elixir libraries, for example the popular JSON parser Poison. Well-behaved libraries will generally use strings as map keys when converting external data into an internal data structure. We recommend the same.

Rule #2: Convert External Data to Structs ASAP

Elixir structs have some attractive properties. They enforce a (weak) schema on your application data, so that code may rely on the structure that the model struct provides. Like other atom-keyed maps, they allow access to the member values using dot notation; e.g., some_struct.foo rather than the equivalent some_struct[:foo] .

Structs enforce that you aren’t cramming in non-schematic data, discouraging you from reverting to the “freeform bags of data” mindset. And they are easy to specify in pattern matches in function definitions or case statements, to add a bit of runtime type safety to code.

The basic approach to this procedure is to define a model module with a struct inside it, then write a function to create and populate a struct with a map of externally-sourced data. This is a better idea than scattering code to populate the structs around your code; it’s more DRY and allows you to make changes in one place and test across the board.

Appcues has released a Hex project called ExConstructor to aid in this process. ExConstructor automatically builds constructor functions for structs, handling map-vs-dict, string-vs-atom, and even camelCase-vs-under_score key format automatically. Using ExConstructor means that your code may never have to use string-keyed maps directly.

Rule #3: Use Structs in All Other Code

If code in your app uses data which originated from the outside world, and it’s not the one piece of code dedicated to converting that type of data into a struct, it should be explicitly written against the model structs.

Write function definitions which make it clear that struct input is required:

def some_func(input) do ...               # no
def some_func(%{}=input) do ...           # no
def some_func(%MyStruct{}=input) do ...   # yes

Structs provide guarantees. Guarantees strengthen your code by removing the need for pervasive error/anomaly checking. This leads to better code in less time.

Note: Phoenix/Ecto models are simply structs with some nice features around changesets. Using them across your app is encouraged, though it’s wise to decouple the database access from the rest of your app, perhaps using a data access object, if there is any chance you may move to a different database backend.

Rule #4: Use Structs for Output Data

Using structs for output data is a good idea for the same reasons that using it internally is a good idea: guarantees, DRY, and, well, structure.

Structs convert to JSON maps quite easily, through either Map.from_struct/1 , or custom solutions like @derive [Poison.Encoder] .

Rule #5: Avoid Using Atom-keyed Maps That Aren’t Structs

Using a plain map for passing data around your app may be convenient to write the first time, but over the lifecycle of an app, it leads to decreased expressivity and more confusion about the exact shape of the data at any given point. Creating structs to represent state, even state completely internal to a module, reduces this confusion by plainly advertising (and enforcing) the structure of the underlying data.

There will be times to use plain maps, of course, but if your code has more bare maps than model structs, this may be an indication that your app’s state objects should be better-defined.

When you do use them, don’t fight the language; use atoms as keys.

Rule #6: Use Keyword Lists Only for Function Arguments

Keyword lists are the idiomatic Elixir way to provide keyword arguments to a function. Since only the programmer should be writing these arguments – sanitize your inputs! – atom-related DOS is not a problem.

Summary

The choice between maps, structs, and keyword lists, or between string keys and atom keys, is not always clear, and improper decisions can lead to headaches.

This article promotes simple guidelines to keep the Elixir programmer out of trouble:

  • Use string-keyed maps for data which has just arrived from an external source.
    • Convert external data to structs as soon as possible.
    • Use ExConstructor or something similar to abstract away the handling of string-keyed maps.
  • Use structs almost everywhere else in your program.
  • Use other atom-keyed maps sparingly.
  • Use keyword lists only to provide arguments to a function.

These rules of thumb have improved the quality of our code here at Appcues, and we hope they can help you out too.

Thanks to Ian Asaff for reading drafts of this post.

6 Likes

In support of Rule #2: Convert External Data to Structs ASAP, here are two libraries to try:

  1. Nestru - serializes between nested structs and maps, relating like Book - Order - LineItem. It provides a hook for mapping string keys of any syntax to struct’s fields. Furthermore, it supports serialization of type dependant fields when the map’s key points to a value of one or another struct kind.

  2. Domo - not only adds a constructor function to a struct module but also makes it validate the data to match the type spec of the struct. These are new!/1, new/1, esure_type!/1, ensure_type/1 functions.

2 Likes