What is the difference between preemptive scheduling in Java and Elixir?"

I also agree there are tradeoffs and the JVM has had a lot of investment to get where it has with performance.

The BEAM will never have the same straight-line performance but it also hasn’t had anywhere near the same investment (hopefully one day I get to eat these words). It’s how everything combines in the end which makes BEAM languages compelling and suited to a range of problems where soft realtime, low latency and reliability is desired.

Whilst low latency is a compromise vs straight line performance, the BEAM opcode model is pretty good in that there are no backward jumps in a function, and loops are always done with tail calls so there is no way to hog the CPU and also no need to cooperate by manually yielding. Yes checking reductions has a slight overhead in fast paths /tight loops, this is shadowed by an amount of nontrivial frequent scheduling overhead which is an unavoidable cost irrespective of the language where you want low latency. Fortunately scheduling is much easier in the BEAM because of the lack of shared memory and therfore locks between processes.

In other systems the scheduler has to deal with priority inversion because some low priority thread has a lock that a higher priority thread needs in order to proceed. This kind of complexity is avoided in our programs because it’s shared nothing and the schedulers and runtime are carefully designed in how they queue messages and share immutable buffers to avoid contention. Think of lots of processes each wanting to queue their message to the end of some process message queue and doing it in a way that each process has their message added at or near the end safely without all of them having to block and wait for each other in turn. It’s clever stuff and ensures more work gets done concurrently.

Some of the tradeoffs the BEAM makes improve performance, GC is per scheduler as it’s shared nothing and often a process is short lived, (it serves a request and dies) and therfore needs no GC at all. The GC is also much simpler because of the inherent process design.

Other benefits include no locking contention and critical sections in the code which stop other threads and also create bus locks across cores. The shared nothing model is better for process cache utilisation too as you don’t have lots of cores invalidating shared memory regions in the cache. Organising memory in an owning process pays dividends.

Another cool thing the BEAM does is work stealing to ensure full utilisation of all cores with no waiting about and much more fairness in the overall scheduling, this again works towards lower latency and improved core utilisation. Each scheduler (and therfore CPU core) will end up doing useful work without a complex load balancing algorithm due to work stealing.

The BEAM also taxes reductions on the calling process when the receiving process is under load with a long queue so as to create back pressure up the line.

Other things we don’t often think about is to ensure that a bad message sent to a process which would result in a receive matching error actually beomes an error on the caller (so the calling process crashes), not the receiver. Important things in how reliability and errors are handled and recovered in the BEAM do matter.

Other things the runtime does also ensure that socket buffers from the network (ie port drivers running on port schedulers) are delivered into receiving processess with very low latency, again the BEAMs soft realtime and telco underpinnings shine through.

There is a lot of cool engineering in the BEAM which when all put together with OTP makes a very compelling runtime to build reliable, low latency solutions and to achieve it in most cases with far less complexity, less code writing the happy path, and less bugs. The language choices of a functional paradigm, immutability, shared nothing process isolation with supervision/recovery built on the assumption that programs are faulty makes BEAM languages like Elixir an appropriate foundation building modern web services with carrier grade reliability, scalability and rock solid SLAs in the 99th percentile.

6 Likes

Yep, Golang does pretty much the same and it’s still blazing-fast in many scenarios. It is well-understood at this point that programmers cannot be 100% trusted not to screw things up so the runtimes have to compensate at places. IMO a good tradeoff.

If even that’s not good enough then you have C/C++ and Rust. :person_shrugging: Good luck with async Rust though, I’ll forever mention some of the white hairs in my beard because of it.

Amusingly, people underestimated both Elixir and Golang very often (during my career; I don’t claim it as a global trend) and they have been blown away by the performance of either, a number of times. I guess when they hear “Elixir is 100x slower than Rust” they somehow imagine their software will crawl slower than PHP on shared hosting 10+ years ago.

3 Likes

I guess it also depends on what you do, once you start to use things like IO, more complex concurrency models, the difference starts to become smaller and smaller.

1 Like

Oh, absolutely. That’s also the curse of async Rust, I’ve met a good amount of programmers who think that reaching for it will eliminate their latency, and sadly they never did any measurements because otherwise they would find out that at least 80% of their performance woes come from waiting on I/O – in which case you could very easily get away with Elixir or even async JS.

Reaching for the biggest guns (Rust) is a mistake in many of these cases because you are increasing difficulty by 10x and you will gain something like +20% efficiency compared to your original code, as much as that original code is often viewed in a negative light (and we the programmers love rewriting).

…Whereas in this same hypothetical scenario you can reach for Elixir or JS, increase the difficulty only by 2x maximum, and still get the +10% or +15% efficiency.

Tradeoffs as usual.

1 Like

Not to mention, since async is a very low-level construct, you can easily mess it up and lose a lot of performance just by the inefficient implementation. This is a phenomena I’ve seen a lot of times in java world, but once again there is not much of a choice if you use java and not kotlin or other JVM language that has an abstract concurrency construct.

1 Like

Yeah, having guard rails / training wheels is something that is very valuable – and I don’t want to degrade programmers as “being low quality for needing guard rails / training wheels”. As much as I liked thinking of myself as a hardcore backender, async Rust eventually made me give up because I didn’t see the hugely increased cognitive load pay back in a very meaningful way. So it’s much more about being productive and not about being “a lower quality dev”. I hate devs that look down on other devs in that area in particular!

I rewrote a few programs that I use for myself from Rust to Golang and it took me at least 5x less time to develop them (and one of them literally took me 2 hours to code in Golang whereas the Rust variant took me a full week), and I wasn’t coding Golang for years at this point. I only had 1 bug in 4 programs for the next several months, I went and looked up the problem, realized I made a wrong assumption about Golang’s runtime, introduced a 5-line diff that fixed the bug, and everything was well ever since. Six months and counting now.

4 Likes

I’d argue that they’re actually necessary, often a bad moment is all it takes to introduce a subtle yet critical bug (e.g. silent data corruption), and who can say they never have a bad day?

I don’t consider this a question of competence, over time it’s just not humanly possible to maintain the vigilance that many languages require. Even if you accept lower productivity from working “closer to the metal,” you’ll slip up eventually.

5 Likes

Yep, agreed on all points. That’s why for my work in particular I deemed async Rust to be an investment that has too high a cost for too small a return.

Elixir and Golang give me 80-90% of async Rust’s value for only 10-20% of the cognitive and productivity cost.

This is especially true nowadays, where you have dynamic requirements, always work with dynamic teams, and each person has different level of knowledge and background.

I think the usage of the abstractions is not that different from the usage of other tools, a simple example:

I want to prepare some land for plantation, I can use a hoe or a tractor. The tractor is much more efficient in terms speed, however you need to have it in the first place and you need fuel, while on the other hand the hoe is much more efficient in terms of resource usage and is pretty much free, but it is very slow and arguably not fun :joy: .

We cannot dwelve in these low-level abstractions and build complex software, sooner or later we will have to abstract, the same reason we don’t write in assembly nowadays, it is impossible to reason a more complex application in there.

Here is a 2021 HackerNews discussion explaining why they didn’t want to adopt a more preemptive scheduling policy at the time I was under the impression that Loom was implementing preemptable lightweight th... | Hacker News.

2 Likes

Thanks for the link. I’ve read through the sub-thread.

Ultimately though the Loom dev’s answers are defensive, they basically say “we like to see the use-cases for hundreds of thousands of green threads” instead of doing their own research which they could have done in 5 minutes and quickly arrived at the super obvious answer that web / API backends sometimes have to deal with 5-6 figures of clients.

Disappointing. But also quite telling. Most runtime devs give up and start to rationalize their giving up with “we have not seen proof that this is actually needed” as if the BEAM does not exist. :person_facepalming:

Show me a person that never used erlang VM that would understand what fault tolerance means from the first try :joy: .

I find their take disappointing too. “Accidentally misbehaving subsets” of threads are inevitable in any non-trivial concurrent system by virtue of having been written by humans.

Without preemption, all it takes to lock up an entire service is for a benign-looking bug to send a few threads spinning long enough for nothing useful to get done (e.g. catastrophic backtracking in a regex).

With preemption you get degraded service instead, and with BEAM languages you also get the ability to gracefully kill the offending threads and fix the bug without taking a single unaffected thread down.

Like I said earlier in the thread, it’s not humanly possible to maintain the vigilance that many languages require: virtual threads makes it easy to spawn millions of concurrent flows, but don’t help you manage the consequences of that.

9 Likes

Do you need to understand this, even if you are not doing a lot of low-level stuff?

1 Like

You need to understand why it’s good and important. If you don’t know what problems does the OTP solve then you will think it’s insignificant. While the objective truth is that parallelization is extremely important these days, especially on web / API backends.

3 Likes

That’s what I’m trying to argue: with preemptive scheduling you don’t really need to understand it because the worst case scenario is degraded performance. With Java virtual threads (or similar), you will sooner or later find yourself being taught an unexpected lesson on the vagaries of cooperative M:N scheduling, even if you’re just sticking to high-level stuff.

5 Likes