Is actor/process model at odds with CPU cache architecture?

Hello Friends,
I have been dabbling with Elixir for a while. I come from JVM (Scala) background. Lately I have been reading up on writing CPU cache-aware/cache-friendly code so that my code can leverage the modern CPU’s cache architecture (L1, L2 etc) and hence enjoy a lot of speed up. This topic is also referred to as Mechanical Sympathy in some circles.

There is decent amount of information on writing CPU cache friendly code in C, C++ and Java but I could not find any information for Erlang. Consequently I was not able to find information that clarifies whether message passing model of Erlang will end up causing a lot of CPU cache misses hence lower performance compared to other languages or programming styles ?

From what I have read linked lists are generally going to be more expensive than contiguously allocated arrays as consecutive items of the linked list might not be in the same CPU cache line versus arrays where there is much higher chance of consecutive elements being in the same cache line.

From what I understand the process mail boxes are implemented as linked lists and the messages are copied from one process’s heap to another one (unless its a binary), the result will be something that is suboptimal for CPU cache utilization.

I am a noob to the language and BEAM VM. Appreciate any pointers or tips in this regard.


Let me first put a little disclaimer here: I only have second-hand information about how the BEAM internally works. Maybe for instance @rvirding can provide some first-hand information.

So, let me try to answer your questions in a structured fashion:

CPU caches and BEAM Processes

BEAM processes are executed on as many threads as your computer has, concurrently, by using a Scheduler (one per thread; these of course coordinate). This scheduler executes 100 reductions (i.e. instructions) of Process A (unless A decides to wait for something before that), then 100 reductions of Process B, etc.

A Process itself is allocated as a continuous section of data in memory. It starts with 2 kilobyte of memory (where the Process’ stack and heap grow towards each other). It also has some extra metadata (which I believe is stored alongside this main memory).

When the Process’s memory is full (even after garbage collecting), double its old memory is allocated, and then its information is copied there.

As a single process uses a continous block of memory, and for small processes these will fit inside the cache lines of most CPUs, I think the BEAM will be able to utilize caches quite effectively. Of course there is some extra bookkeeping that is going on between instructions. I am not sure how this might affect cache-effectiveness.

Linked-Lists vs. unboxed Arrays

Linked lists are slower than unboxed (all values having the same type) arrays. The main advantage of a linked list is that, in an immutable language, its tail can be re-used. It will also always only allocate one new element when adding something to the head, while most array implementations (such as C++'s std::vector) will double their available memory if it gets full. The amortized running time of doubling is better, but also less predictable (sometimes no allocation is needed, and then a lot of memory at once).

The same is true if elements are consumed from the start of a linked list vs. the start of an array.

So: Linked lists are more predictable and reusable, but arrays take less memory and are more efficient.

For linked lists inside a process, these will still be inside the same block of Process memory and thus probably within the same cache line.


I am fairly certain that Erlang’s mailboxes are some kind of queue rather than a list, as it works in a FIFO manner. How this queue is implemented, I do not know; but as this abstraction is hidden from the Erlang/Elixir programmer, this might very well use some fast low-to-metal mutable techniques underwater.


Maybe the most important point to note, though: don’t worry about it. The speedup you gain from working with the BEAM VM is that you can properly write concurrent applications that utilize all your CPU’s cores to their fullest extent (and it is also possible to fully transparently scale up by connecting other computers with their own CPUs to this system as well).

Performance is only a problem if you are doing a lot of back-to-back calculations on things of the same type, such as matrix multiplications. In these situations:

  • a bytecode-interpreted language will always be slower than a compiled system-language.
  • an immutable language will always be slower than a similar mutable language.
  • boxed containers will always be slower than unboxed containers.

So if you are doing these kinds of (back-to-back!) heavy calculations, writing this critical section in languages like Haskell, Rust, C++ or Fortran is better.


Hey @Qqwy thanks for the detailed response. You have given me some pointers to follow up and read on to understand better how the BEAM datastructures are implemented. As you said there is very high possibility that the BEAM implementers have already done most optimizations needed and we as developers don’t have to worry too much.

Thanks again!

Another thing to consider is how BEAM manages memory inside the process.

A process has one continues chunk of memory that it uses for everything. On the one end is the process stack, on the other end is process heap. They grow towards each other. This means that allocations are very simple - you bump just one pointer, this also means that things allocated at the same time, are close to each other. So some data structures that generally aren’t cache friendly, in practice, are much less problematic.

Here’s an interesting article that goes in depth on the GC and some parts of the memory model:


Oooo, an update to read, nice. :slight_smile:

@michalmuskala thanks for pointing to that article, its very informative

1 Like

Yes, it’s a very good article by someone who really knows what he is talking about