Let me first put a little disclaimer here: I only have second-hand information about how the BEAM internally works. Maybe for instance @rvirding can provide some first-hand information.
So, let me try to answer your questions in a structured fashion:
CPU caches and BEAM Processes
BEAM processes are executed on as many threads as your computer has, concurrently, by using a Scheduler (one per thread; these of course coordinate). This scheduler executes 100 reductions (i.e. instructions) of Process A (unless A decides to wait for something before that), then 100 reductions of Process B, etc.
A Process itself is allocated as a continuous section of data in memory. It starts with 2 kilobyte of memory (where the Process’ stack and heap grow towards each other). It also has some extra metadata (which I believe is stored alongside this main memory).
When the Process’s memory is full (even after garbage collecting), double its old memory is allocated, and then its information is copied there.
As a single process uses a continous block of memory, and for small processes these will fit inside the cache lines of most CPUs, I think the BEAM will be able to utilize caches quite effectively. Of course there is some extra bookkeeping that is going on between instructions. I am not sure how this might affect cache-effectiveness.
Linked-Lists vs. unboxed Arrays
Linked lists are slower than unboxed (all values having the same type) arrays. The main advantage of a linked list is that, in an immutable language, its tail can be re-used. It will also always only allocate one new element when adding something to the head, while most array implementations (such as C++'s std::vector
) will double their available memory if it gets full. The amortized running time of doubling is better, but also less predictable (sometimes no allocation is needed, and then a lot of memory at once).
The same is true if elements are consumed from the start of a linked list vs. the start of an array.
So: Linked lists are more predictable and reusable, but arrays take less memory and are more efficient.
For linked lists inside a process, these will still be inside the same block of Process memory and thus probably within the same cache line.
Mailboxes
I am fairly certain that Erlang’s mailboxes are some kind of queue rather than a list, as it works in a FIFO manner. How this queue is implemented, I do not know; but as this abstraction is hidden from the Erlang/Elixir programmer, this might very well use some fast low-to-metal mutable techniques underwater.
Optimization/Efficiency
Maybe the most important point to note, though: don’t worry about it. The speedup you gain from working with the BEAM VM is that you can properly write concurrent applications that utilize all your CPU’s cores to their fullest extent (and it is also possible to fully transparently scale up by connecting other computers with their own CPUs to this system as well).
Performance is only a problem if you are doing a lot of back-to-back calculations on things of the same type, such as matrix multiplications. In these situations:
- a bytecode-interpreted language will always be slower than a compiled system-language.
- an immutable language will always be slower than a similar mutable language.
- boxed containers will always be slower than unboxed containers.
So if you are doing these kinds of (back-to-back!) heavy calculations, writing this critical section in languages like Haskell, Rust, C++ or Fortran is better.