Preemption of Processes Reductions vs Timer (a question that's been asked before I'm sure)

brandonpollack23 · January 31, 2024, 10:02am

I was looking into lunatic which is a BEAM inspired framework for rust/wasm applications.

Seems pretty cool and it raised a concern I’ve had in the past about BEAM.

If I understand correctly (big if), their wasm runtime utilizes a timer to pre-empt their executing processes and switch between them.

This makes sense, their compilers dont necessarly insert reductions of any sort and WASM runtimes don’t do that either.

My understanding is this:

A timer has some downsides. Some of them include requiring a thread to monitor it and pre-empt other running OS Threads. This means you have to set some state that causes them to stop executing or send some manager of them a message to stop their execution–or use a re-entrant lock–and save the state/bookkeeping etc. This is how setTimeout etc work and it can be done with data structure of timers and a sleeping (interuptable thread).

I’ve also heard vaguely that timers are unreliable or slow, but I don’t quite understand how this is true on a single machine (since this timer would be on one thread).

Reductions are simpler and allow the VM to pre-empt itself somewhat instead of relying on a mechanism to do so.

Basically this is all floating in my head and I’d love to start a discourse on this and get it more out there so when others think this there is an answer why not timer based pre-emption and yes reductions for the use case of Erlang/Elixir, but fair justification on how a timer may work out for others (if it would). Has anyone ever investigated these alternatives? I am certain this has been thought about and tried but I can’t quite find info on it.

jhogberg · January 31, 2024, 10:50am

I experimented with this a long time ago, using a fixed-frequency timer that set a flag to schedule processes out whenever two ticks passed without being them scheduled out by other means (e.g. receive). It worked but it wasn’t faster than reduction counting, and back then there was no clear road to make that happen. The JIT may help us remove some overhead but I’m still not sure it would lead to any concrete gains.

dimitarvp · January 31, 2024, 11:38am

I am not close to the implementation details at all but I’d think that timers are still more reliable than reductions; imagine a bad actor just spawning a number of processes (bigger than the schedulers count) that all just do stuff without recursing or calling any other function. Come to think of it, I never tried it – does the BEAM actually truly stall then? Probably not.

Though IMO a good compromise (if one doesn’t want to use timers) is that compilers should just insert the right code at the right place, kind of like Golang does.

From my experience with Rust so far, work-stealing schedulers are also an amazing way to do things.

jhogberg · January 31, 2024, 12:01pm

I am not close to the implementation details at all but I’d think that timers are still more reliable than reductions; imagine a bad actor just spawning a number of processes (bigger than the schedulers count) that all just do stuff without recursing or calling any other function. Come to think of it, I never tried it – does the BEAM actually truly stall then? Probably not.

If you purposefully call things that don’t count reductions or yield appropriately, e.g. badly written NIFs, then yes, it would stall.

There are ways to cause issues in Erlang code, but you would have to write some truly contrived code for a bad actor to exploit it. Should anyone run into such an issue with real code we’ll fix it then and there.

However, these things would misbehave just as badly if implemented with timers as there’s no (edit: sane, documented, and cross-platform) way for user-space to preempt something that isn’t prepared to be preempted. The best that a user-space implementation can do is to set a flag or whatever that is later checked, and if that never happens, the target will never be scheduled out (c.f. POSIX thread cancellation: nothing happens until a cancellation point is reached).

Though IMO a good compromise (if one doesn’t want to use timers) is that compilers should just insert the right code at the right place, kind of like Golang does.

This is more or less how our implementation works, too.

From my experience with Rust so far, work-stealing schedulers are also an amazing way to do things.

Our schedulers steal work from each other when appropriate.

dimitarvp · January 31, 2024, 12:05pm

Yep, I suspected. It’s one of the things that makes the BEAM so good. Keep at it.