Help understand concurrency and concurrency issues

foggy · October 19, 2018, 5:12pm

I personally picked up Elixir by converting a high-throughput Rails API (that was having scaling issues) to a Phoenix API. I had previous experience scaling the Puma webserver on Rails for an API that backed a mobile application, and while the right settings led to success in that instance, this particularly API was just high-burst throughput at set intervals and I had no idea how to get the right configuration w/ Puma without resource-scaling through the nose.

A new webserver for Ruby got announced recently called Falcon, which is built on a more asynchronous model and can scale how it handles requests more gracefully.

How?

When I first started making web apps, everybody told me not to worry about scaling. I think it’s good advice for diving in and building something that works. But at the core of every real-world project I’ve done has been a scaling issue, and most of them have been related to dealing with throughput. I would love to better understand why running on Erlang automatically means concurrency is handled better, and why this new Ruby webserver handles these things better in spite of the fact that Ruby is weak in this area.

What are some good resources for better understanding the practical science behind this stuff?

OvermindDL1 · October 19, 2018, 5:29pm

Because every actor/process on the BEAM is a shared-nothing message-passing system, thus no locks, they can be run transparently on any number of cores or even across multiple distinct systems and hardware, easy to reason about as inside each process is fully immutable with only needing to reason about messages, each actor can run concurrently and in parallel, etc… I.E. things on the BEAM scale ‘almost’ for free (as long as they are not waiting on messages, which is not an issue with web based systems as distinct sockets have a tendency not to talk to each other ).

No clue about the ruby bit, I don’t use it.

easco · October 19, 2018, 7:46pm

Let’s turn to the description of Falcon from their web page:

Falcon is a multi-process, multi-fiber rack-compatible HTTP server built on top of async, async-io, async-container and async-http. Each request is run within a lightweight fiber and can block on up-stream requests without stalling the entire server process.

In particular this last sentence states that each request that comes into the server gets a dedicated “computing unit” (in this case a “lightweight fiber”) to handle that one request. This means that one request that is stalled out while communicating to a back-end system will not prevent the other requests from running.

Cowboy (a popular HTTP server for Erlang/Elixir) does something similar. Each incoming request is handed off to an Erlang Process (the “computing unit” of the system). Each of the processes runs independently so one of them stalling on an I/O operation - like the back-end system request, will not prevent the others from running.

The same idea - take a single UNIX process and handle multiple, concurrent requests, in separate “computing units” where a blocking operation in one cannot prevent the others from running.

Where Erlang “means concurrency is handled better” is in the set of tools that are built into Erlang that Ruby does not provide. The first thing that comes to mind is that Erlang Processes have isolated memory spaces. In Ruby more than one “lightweight fibers” might share the same object. A bug in one fiber could put that object into an inconsistent state leading to corruption in all the fibers.

This kind of viral corruption can happen in Erlang/Elixir as well, but only if one process sends a message to another and the receiving process doesn’t validate the incoming message before accepting it.

Another set of tools that Erlang provides is monitoring and linking between “computing units” (allowing for the OTP abstraction, Supervisors). In Ruby, if one of your “lightweight fibers” goes away, it’s just gone. The system doesn’t provide any built-in way for another fiber to know that the failing one has gone away and do something about it. So long as the developer of Falcon have done their work properly, a failure - say an unhandled exception - in one fiber shouldn’t take down other fibers or the whole system. But that has to be handled at the framework level. In Erlang, that kind of isolation is built into the base system.