I’m writing about the advantages of preemptive scheduling in Elixir (inspired by this post).
Upon research, I found out that Java(JVM) also supports preemptive scheduling based on priority, and the newly introduced virtual threads seem to allow the creation of multiple virtual threads like Elixir, with preemptive scheduling.
I’m wondering if I can still mention preemptive scheduling in Elixir as an advantage in the future.
Virtual threads are coming to the JVM. Charles Nutter from the JRuby project just presented on it at the Carolina Code Conference a couple of weeks ago. He didn’t talk about preemptive scheduling but he did demo the virtual threads.
Essentially, Ruby fibers won’t work in JRuby until that’s ready but it seems to work great. Something to keep an eye on for sure.
Here’s the video for anybody interested.
EDIT: Virtual threads explanation is at the 45 minute mark. It doesn’t look like preemptive scheduling based on the explanation. Closer to the async IO approach you’d see with JavaScript.
LOOM/JVM virtual threads are not preemptive and they yield only when they hit IO to my understanding. So they are close to async/await in C# but without needing to add keywords anywhere.
This should be a good read about them https://blogs.oracle.com/javamagazine/post/going-inside-javas-project-loom-and-virtual-threads
It seems they are still counted as preemptive according to this article. But current implementation works really close to C#'s async/await without keywords compared to much fairer scheduling like in BEAM. Maybe some more knowledgeable about BEAM internals can explain it better.
This is still stone-age concurrency, if they threw in a channel for these green threads to communicate between them, not shared memory as they always use, then it would mean something.
As it stands I see no difference between this project and things like coroutines in Kotlin.
It’s hardly preemptive when a loop or bad code never yields.
We don’t have loops in BEAM languages, only tail call recursion. The BEAM has yeilding integrated into every call and return and the scheduler gives a precise budget (number of “reductions”) to each process, it is known as “reduction scheduling”.
The process state is always safe to preempt at a call or return. Native functions are also guaranteed to complete within budget or yield, e.g. a long running regex NIF has yeilding baked in and there are also other dedicated “dirty schedulers” for using potentially unsafe native code libraries that do not have yielding guarantees.
The BEAM also provides work stealing in it’s schedulers and back pressure on processes messaging a process that is overloaded with a long message queue.
The other part of the BEAM that goes hand in glove with preemption is the shared nothing process model. The BEAM is always able to preempt and kill a process AND guarantee the cleanup. This is something you can’t do in other languages with shared state which can only set a flag to ‘please cleanup and die if you ever get to a yield point’. Hopefully the thread completes the long running regex or the loop (a no faulty code assumption) and a yeild point is eventually reached and the shared data structures / objects can be meticulously repaired to a sane state by your code and all locks released so the VM can finally retire the thread. Kiss goodbye to the soft realtime fair scheduling that the BEAM provides even in the face of faulty code.
The assumptions in the two models are very different. Almost all other runtimes assume correct programs and still may not preempt, cleanup and kill the process/thread leaving a global graph of damaged object state, to impact other threads, whilst the BEAM assumes incorrect programs from the outset and provides guaranteed preemption, process kill and state cleanup.
So what happens is the problem gets pushed to an ops problem of complexity to monitor, restart, kill and orchestrate “processes” aka collections of containers. Hence a container ops team is required for a much slower and expensive recovery, but in essence that’s the layer that guarantees the cleanup, shared nothing and low latency in the 99th percentile, it’s essentially the same model, one just needs a team vs one just needs a beam.
Well, it depends on the implementation, but in most common implementations and OSes, ordinary java threads (as opposed to the new virtual threads) were always preemptive. Java threads (as I’m sure you know) are directly mapped to OS threads, and modern OSes schedule their threads preemptively.
With 1:1 OS threads you will never get millions of lightweight processes cheaply like the BEAM. Whilst the BEAM does use O/S threads for its schedulers the overhead of BEAM processes is extremely light vs OS threads:
An Erlang process is lightweight compared to threads and processes in operating systems.
A newly spawned Erlang process uses 326 words of memory.
The size includes 233 words for the heap area (which includes the stack). The garbage collector increases the heap as needed.
So the actual process overhead excluding the stack and heap is 93 words. This is why spawning millions of them is a non event.
Java requires 16Kb base overhead physical RAM for a thread that does nothing but sleep as explored here but it also allocates a minimum of 1Mb virtual ram.
Even with Java using OS threads the assumption is the code must be correct and it must cooperate and check for interruption. From the Java docs:
What if a thread goes a long time without invoking a method that throws InterruptedException? Then it must periodically invoke Thread.interrupted, which returns true if an interrupt has been received. For example:
for (int i = 0; i < inputs.length; i++) {
heavyCrunch(inputs[i]);
if (Thread.interrupted()) {
// We’ve been interrupted: no more crunching.
return;
}
}
The Interrupt Status Flag
The interrupt mechanism is implemented using an internal flag known as the interrupt status. Invoking Thread.interrupt sets this flag. When a thread checks for an interrupt by invoking the static method Thread.interrupted, interrupt status is cleared. The non-static isInterrupted method, which is used by one thread to query the interrupt status of another, does not change the interrupt status flag.
By convention, any method that exits by throwing an InterruptedException clears interrupt status when it does so. However, it’s always possible that interrupt status will immediately be set again, by another thread invoking interrupt.
Of course everyone’s Java code does this right?
So there is the stake through the heart of Java “preemption”. It is basically a stoneage co-operative multi tasking system, set a flag and pray your code is correct everywhere and yeilds to check the interrupt flag and extricates itself cleanly out of its global object graph mess by exception handling. At that point your Java reliability is a patent FAIL at a fundamental level.
In contrast the BEAM made deliberate design choices based on programs being faulty and still provide guaranteed preemption, process kill and cleanup.
Where did I say that with 1:1 OS threads you can get to millions of lightweight processes like the BEAM? Not sure what you’re actually responding to here.
So there is the stake through the heart of Java “preemption”.
Java doesn’t do any preemption in most implementations, that was my point. Java threads are OS threads and the OS preempts its threads when and how it likes.
Even with Java using OS threads the assumption is the code must be correct and it must cooperate and check for interruption
This is a Java-specific mechanism for Java threads to coordinate/communicate with one another and has nothing to do with the scheduling done by the OS. The OS can still interrupt a thread whenever it likes and give control of the CPU to another thread, in the same process or in another one. That’s what OS schedulers do and they don’t care if your thread is executing Java code, C++, Rust or what not.
He’s responding to the OP who was asking about the difference. JVM threads are preemptive by the OS, which makes it hard to have a ton of them, it’s part of the trade off.
He was quoting and responding to me, not the OP. IMO, It’s important that we respond to what people are actually saying and not what we think they are saying.
This feels like cooperative scheduling pretending to be pre-emptive by hiding the details from the user. Calling it pre-emptive doesn’t make it so, and as the article happily points out it’s easy to starve virtual threads by not doing IO very often or at all.
I must confess I’m a bit disappointed as I thought the big deal with Loom was that the virtual threads acted like regular ones in almost all respects, and that the user didn’t have to think about yielding at all, but alas.
(To be fair BEAM also uses cooperative scheduling internally, but it places yield points all over the place so you can’t accidentally hog a scheduler forever, and we consider anything that leads to it to be a serious bug)
Yeah, same. They periodically make a lot of noise and I really thought they’re getting close to the BEAM, or at least Go’s goroutines runtime. But it seems they’re still at the baby steps phase and have been for years. Disappointing.
Yeah didn’t feel right to me either calling it that.
One good thing about virtual threads even with this current implementation is that it avoids viral effect of async/await. If you need async stuff at the lower level you have bring it down from all the way from the top and you don’t need two versions of a function if you need both synchronous and asynchronous versions like is case with many .NET’s libraries.
It would be interesting to know if this is a limitation of vision, or there is an actual technological limitation in the ecosystem itself that doesn’t allow them to do this. The fact that they delegate java threads to OS seems to be pointing about total lack of expertise in concurrency in their VM completely.
The same approach is used with suspend functions in Kotlin and IMO I think it is a good feature, because it makes an explicit contract with the developer that this function is expected to work asynchronously and you have to handle it explicitly in your code.
From my 9-ish years spent with Java long time ago, I believe it’s legacy. They are too afraid of making breaking changes, they have to move slow and make sure everything keeps working every step of the way.
Even though that approach works for their enterprise (read: hugely lucrative) customers, it also dooms them to be forever the old and legacy language.
Not an easy predicament for them, I admit. But if I were them I’d just go all the way into making something like Golang’s runtime as a separate library and/or something you can enable via a CLI flag, so it must be strictly opt-in so as for it not to mess with existing code (and in the case of Java we’re likely easily talking about tens of billions of lines of code).
But it seems they don’t have the resources or indeed the expertise (as you said) to do something like this. Or it’s much more complex than we think – which is a fact, surely, but again, I don’t understand why don’t they just work on this in more isolation. Oh well. I never intended to get back to Java anyway.
Well now that you ask, you did refer to preemption of OS threads which is inherently not the Java green threads that offer zero concurrency (i.e. all threads running cooperatively on a single OS thread scheduled by the Java VM), and you specifically excluded the new M:N virtual threads, so you are in fact referring to the Java 1:1 OS threading model where threads are real OS threads and scheduled by the operating system.
Java OS threads are not lightweight vs BEAM processes which are cheap hence spawning a million of them is a much bigger ask.
While it is the operating system that is doing the preemption to schedule the Java thread, Java has an inadequate runtime that assumes correct code and cooperative checking at yield points by the developer to ensure the thread can throw an exception to allow the developer to trap a signal and ensure they undo whatever they were doing and return the global object graph state to a sane state before the runtime terminates the thread.
Yes the OS can preempt any process or thread, however when it comes to the Java runtime, it is a flawed design because it is based on the assumption that all code and programs are correct and cooperatively yield to allow signals/exceptions to occur and can cleanup correctly without race conditions, dead locks and logic bugs.
It is also not soft realtime fair scheduling and leads to higher latencies because the amount of work is not strictly controlled, so a thread will hog the CPU for much longer vs on the BEAM which does offer soft realtime behaviour through strict reduction based scheduling even if the code is stuck in a death loop, it will be preempted and other processes will be scheduled , and that errant process can be killed and cleaned up correctly without its cooperation or the code being correct.
Additionally, switching OS threads is higher overhead.
So you have higher overheads, higher memory requirements, stop the world GC requirements as memory is shared all over the place and not owned by a thread (it’s not a shared nothing process model) so Java can’t safely terminate badly behaving threads unless they co-operate.
So in summary it is a reliability fail at a fundamental level.
I agree with you that this (and the monitor/kill system) plus distribution are the main differentiators of Beam. OTOH, as always, it’s a matter of trade-offs. Performance-wise, it does not come cheap - sharing immutable data structures by sending over a pointer is pretty handy, and having so many yeld points evaluated has a definite cost. Plus, the JVM itself is a marvel of engineering - you get C++ execution speeds with Beam-level runtime inspections.