The BEAM and Parallelism

NobbZ · April 25, 2019, 6:23am

Moved from another Discussion

There is no parallelity on the BEAM, only concurrency, but don’t let us argue about the theoretical difference of both words, as its definitions seem to differ even between universities…

Anyway, if you spawn a lot of processes then your code is highly concurrent, if not then not.

But regardless of the number you put on it, it doesn’t matter if this number is high or low, it matters if the program solves your problem in a manor that is efficient enough for you in terms of CPU, memory consumption and especially wall clock time.

benwilson512 · April 25, 2019, 1:02pm

If you have more than one scheduler and more than one CPU you get actual parallelism.

NobbZ · April 25, 2019, 1:07pm

From the definition that I’ve learned, only if they do the same operation on different data. You can never do that on the BEAM.

hubertlepicki · April 25, 2019, 1:14pm

Why not? Spawn the same function twice, with different data, if you have 2 CPUs it is likely to run in parallel on different cores.

peerreynders · April 25, 2019, 1:21pm

That’s SIMD, (single instruction, multiple data) a particular class/type of parallelism. MIMD (multiple instruction, multiple data) also classifies as parallelism.

Erlang was initially developed for concurrency on the then common SISD (single instruction, single data) CPUs. Once multi-core CPUs became common an MIMD version (resulting in R11B) was developed.

I think it is accurate to say that there is no guaranteed parallelism. If you are running 4 schedulers on 4 separate cores, you can run no more than 4 things in parallel but there will be times where less than 4 things will be running in parallel.

Even under the best of circumstances one should expect no more than 0.75 * n speedup for n > 1 cores.

rvirding · April 26, 2019, 1:13am

I don’t think you can get away from discussing about the meanings of concurrency and parallelism and how they relate to each other. Unfortunately. In my view they are two different things, though not necessarily unrelated.

I see concurrency as property of the problem or your solution to the problem. Splitting your system into multiple processes which communicate with each by messages can just be a very nice and practical way of describing the problem and hence the solution. For example if you are doing a server which has to handle multiple connections then structuring the system so that each connection has its own set of independent processes can be a very nice way of building the system.

Parallelism however, I see as a property of the underlying hardware. It is that which determines how many things I can actually do in parallel at the same time. Then it is up to me to design my system so that it can actually use this parallelism.

That is my view anyway.

So Erlang/Elixir, the language, gives me a base and a set of primitives for building concurrent systems. How I use it is up to me. The BEAM implements this base and provides the concurrency which I can use when I design my system. It provides all the processes, communication and error handling etc. It also uses the parallelism provided by the underlying system to run my concurrent processes in parallel where possible. So if it has access to six cores it can do six things at the same time, if your system design allows it.

Now things start getting tricky and we can show that concurrency and parallelism are different things. At least from my point of view.

Here is a simple examples. Suppose we build a ring processes where we can send messages around the ring. So if I send a message to the first process and the message will be sent from process to process around the the ring and finally come back to the original sender. Now we build a ring with 1000000 processes around which I will send 1000000 messages. That is a lot of concurrency!

Now I build my system so that it sends messages around the ring one message at a time, so it sends a message waits for it to go around the ring and then sends the next message and so on for all the 1000000 messages. How much parallelism do we actually have? Very little in fact. We have a very concurrent system which is basically sequential where any underlying parallelism won’t be usable.

So the language provides the concurrency, and the BEAM can provide the parallelism but it is up to you to design your system in such a way as to use this parallelism.

Sorry this became much longer than I had planned.

mythicalprogrammer · April 26, 2019, 10:29am

I think concurrency is the precursor to parallel ability. While Erlang doesn’t enforce strict parallel in syntax but… if the similar jobs run concurrently in different schedulers it count as parallel right? The only problem is you can’t dictate which particular jobs should be run in parallel.

I do agree with your definition of parallel from my understanding, “…only if they do the same operation on different data. You can never do that on the BEAM.”

hauleth · April 26, 2019, 11:23am

Concurrency is feature of algorithm.
Parallelism is feature of runtime.

For me this is all there.