Most efficient way for a process to voluntarily yield scheduling priority?

Background

I’m working on a Discord bot in Elixir that runs as two separate applications, the bot itself, and all the backend logic and data handling that needs to be done on behalf of the bot. The back end application has to do a lot of data processing on startup which it’s currently handling asynchronously (curretly, each component in the backend is a GenServer, and the init callback just immediately returns a continue instruction that triggers the actual data processing needed for initialization). This is working rather well overall, except for one specific component which has a long and computationally intensive initialization sequence after a change in either the code or the initialization data for that component.

The component in question needs to process a very large list (roughly 2900 items) by computing an SQL transaction for each item and then running that against a database. The amount of processing here is time-prohibitive if it needs to be done serially (each item in the list takes about 50-100ms to process and then run the SQL transaction, so the full list takes almost 5 minutes if run one-by-one), but there are a handful of computations that can be shared across all the items, so my current code is using Stream.chunk_every/1 and Task.async_stream/3 to run the initialization in a number of parallel chunks equal to the number of online schedulers like so:

items
|> Stream.chunk_every(div(length(items), System.schedulers_online()) + 1)
|> Task.async_stream(&process_chunk/1, ordered: false)
|> Enum.to_list()

This is working in terms of actually processing things correctly and making the initialization fast enough to be useful, but causing a completely different issue in that it’s blocking scheduling of other processes for a long time, which is causing the bot itself to fail initialization because it can’t finish starting up before this starts running.

The question

My first instinct here based on experience elsewhere is to have the process_chunk/1 function voluntarily yield scheduling priority (I suppose this translates to voluntarily moving to the end of the run-queue for the scheduler in BEAM terms) before it processes each individual item. Right now, I’m doing this by running :timer.sleep(1) at the beginning of each iteration within process_chunk/1, which seems to be working to ensure that other things can run but feels like a bit of a hack TBH and also adds to the overall initialization time for this component (it’s only ~91ms of extra time on my development box, but translates to ~734ms on the production system it will be running on due to a much lower scheduler count).

Is there some more efficient way to voluntarily yield scheduling priority in Elixir or Erlang? Or is there perhaps some other approach I could take here that still lets other things run without significantly impacting the initialization times for the component in question?

1 Like

What do you mean exactly with that it is blocking scheduling of other processes?

I’m not an expert at this, but the BEAM has preemptive scheduling which means that nothing should be able to block the scheduler for a long time

2 Likes

The scheduler’s designed to equitably share CPU between all the runnable processes; if it’s blocking that means something else is going wrong.

How many database connections are in Ecto’s pool? If there aren’t more than System.schedulers_online, that could cause the Tasks to hold all of them and then block forward progress from other processes.

1 Like

Based on my understanding of how scheduling works on BEAM and barring prioritization of processes (which are all equal for the relevant tasks here unless the third-party library that constitutes most of the code for the bot application is messing with them), it’s actually possible for processes to block for a theoretically arbitrary amount of time in BEAM because it uses a combined count of function and BIF calls to determine when to switch processes (and I think it also automatically moves new processes to the beginning of the run queue, but I’m not sure about that). Given that it waits for a total of 2k calls before forcing a switch, it’s possible for code that does not call a lot of functions, or code that receives a lot of messages (because receiving a message gives the recipient process a higher priority to run) to edge out other processes from running until it does something that causes other code to run (either messages another process which then triggers the required conditions for rescheduling, or hits the 2k call limit, or does something that causes it to sleep or hibernate).

In this particular case though, I’m 99% certain it’s a scheduling issue. Stubbing things out so that the GenServer that has this complicated initialization code does not run makes everything else work correctly, as does adding :timer.sleep(1) to each iteration of the initialization code (which obviously causes the process to sleep) or reducing the initialization code to use a number of processes less than the total number of online schedulers.

I’m not necessarily certain the relevant code is actually blocking other things from running per-se, but it’s screwing with the scheduling of other code on the same node somehow, and using :timer.sleep(1) to voluntarily reschedule, which leads me to believe it’s somehow blocking other code from running. However, for all I know it could be a side effect of the internal behavior of the scheduler and the order stuff is getting started in.

Ecto isn’t involved here, I’m just poking at a local SQLite3 database using Sqlitex, serialized through a single GenServer instance (it’s largely a case of using SQL here because it’s moderately easier and rather surprisingly significantly faster than trying to do the same thing with ETS or mnesia), and there’s no concurrent access from any other processes except those that are trying to initialize the database (because all other access would be through the GenServer that’s waiting for the initialization code to finish).

Looking a bit deeper, I think what’s going on here is that something in the third-party code being used where things are failing to initialize is expecting some sort of ordering constraint in the scheduling that is not holding true when the scheduling code has more runnable tasks than it has online schedulers.

1 Like

You shouldn’t have to worry about doing such scheduling because of the BEAM. Either a process has work to do, and then it will do it, or it doesn’t have work to do, which means another process gets to run instead.

A scenario where a process doesn’t have many functions to run just means it doesn’t have work, so it will be schedule out. It won’t block or wait until work becomes available.

Maybe there is a shared resource or something else they are all trying to reach out, but from a glance it doesn’t look like an issue with the VM scheduling.

1 Like

{sql} serialized through a single GenServer instance

I suspect this is your problem. GenServers are effectively “single threaded”, one or more of your initializations is performing a GenServer.call, which times out after 5s, causing the caller to crash, and after a bunch of failures it travels up the supervision tree and eventually the application supervisor gives up the ghost. It would be tricky to truly debug your system without seeing exactly how you are architecting your system startup.

I’d be slightly worried about overuse of handle_continue; that seems likely to indicate an antipattern. Have you considered a pubsub model instead?

If it is a scheduling issues the only way it could be one is if the sqlitex NIFs are dirty and you’re creating more connections than exist dirty schedulers. I don’t know if that’s the case. Otherwise it really cannot be a scheduler problem.

1 Like

What do you mean exactly by causing the bot itself to fail initialization because it can’t finish starting up before this starts running?

If the bots needs that all data is processed, then it should take 5 minutes to start up and that’s it. If that is ok then you should not use :continue/handle_continue but do the work in the init callbacks, so if the bot is “below” in the supervision tree it will wait that its dependencies are initialized.

If otherwise you want the bot to be initialized before “this” starts running then have the bot “above” in the supervision tree, so the bot will have its init callback completed before your heavy work will start. Again, this would be without using :continue.

So maybe you could give us more info about your whole startup process.

Anyway, there is an optimized version of sleep(1) that is available: https://erlang.org/doc/man/erlang.html#yield-0 , and it is indeed called yield. It comes with this warning:

There is seldom or never any need to use this BIF as other processes have a chance to run in another scheduler thread anyway. Using this BIF without a thorough grasp of how the scheduler works can cause performance degradation.

This is a curious problem :slight_smile:

First, I’ll echo the sentiment of others that GenServer shouldn’t be blocking for a long time, because that might cause the rest of the system to block. This can be handled in a couple of ways:

  1. Process chunks synchronously during the app or server boot (e.g. in the init callback).

  2. Start a separate task which will start the async_stream, await for the results, and then do something with them (e.g. send them to other processes).

  3. Instead of waiting for all the tasks to finish in GenServer, handle task results as they arrive in handle_info.

However, given you description I’m not sure that this would solve the issue. It’s interesting that including :timer.sleep(1) in process_chunk removes the problem. This could indeed mean that schedulers are blocked, or alternatively that some locking takes place at the SQL level.

If the schedulers are blocked, a likely reason would be a custom native code (NIF). As mentioned by others, the scheduler does frequent preemptive context switching. Due to the functional nature of BEAM languages, functions are frequently invoked, while a single longer-running BIF will bump the reduction count by more than 1. Furthermore, in recent OTP versions BIFs also yield. E.g. since OTP 22, length/1 will yield when called with long lists (source).

To check this I’d try to reproduce the problem using a single scheduler thread. I’d write a test function, e.g. process_big_chunk/0 which processes a larger amount of data sequentially (i.e. no tasks). I’d also comment out the startup processing code, i.e. I’d make the app start the required processes without doing anything else (like starting some activity).

Then I’d manually start a single-scheduler-threaded BEAM with ELIXIR_ERL_OPTIONS="+S 1" iex -S mix. From the iex session I’d first start the oberver (:observer.start), and then spawn an infinite processing loop as:


spawn(fn -> Stream.repeatedly(&process_big_chunk/0) |> Stream.run() end)`

If the observer is responsive (you can click on it and it refreshes data), it means that the scheduler is not blocked. OTOH if the observer is blocked, or very laggy, it would be an indication that something is indeed blocking the scheduler. You could then proceed by sprinkling IO.inspects to see where the blocking takes place (i.e. which operations take long to finish). Alternatively you could start the system with more schedulers, and use observer or Erlang tracing to deduce the same thing.

If the single thread processing doesn’t block the scheduler, the problem could be in how the library and/or SQLite handle concurrent operations. You could try the same experiment using two scheduler threads and two infinite processing loops. Again, sprinkling some IO.inspect for debugging purposes might help discover where the process is blocking.

In any case I feel that the sleep hack is not a reliable fix, and that the issue might still occasionally resurface, so I’d personally spend some time trying to understand the issue. It’s hard to tell exactly where the problem is, but based on your description, it might be caused by the NIF implementation of the 3rd party library, or by the concurrent behaviour of SQLite. Of course it’s also possible that you stumbled upon some bug/deficiency in Erlang, but I don’t think this is likely.

Either way, some further analysis & debugging is required to properly understand this. Best of luck and keep us posted :slight_smile:

3 Likes

It’s a bit more complicated than either case. The bot is running as a separate application from the core logic that this initialization is in and is configured to properly report errors back to users if it gets a timeout calling into the core (which, because of how the core is written, is what happens if the initialization is not yet done).

The bit itself is using discord_alchemy to do most of the work, and the startup sequence for the bot application as a result ends up looking roughly like this:

  def start(_type, _args) do
    run = Client.start(System.fetch_env!("DISCORD_TOKEN"))
    Cogs.set_prefix("/")
    use Roll35Bot.Help
    use Roll35Bot.Ping
    ....
    run
  end

The stuff after the Client.start1 call all expects the actual client instance to be both started and have finished initializing, but throws an error about the bot itself not being alive (without anything from Elixir/Erlang complaining about any processes exiting abnormally and no log messages even with the log level set to :debug indicating that anything got restarted).


However, I’m starting to suspect this may have been a result of hardware problems on my development system, as I can’t seem to reproduce it on other systems and a after reboot of the development system I can’t reproduce it there either despite nothing else changing.

Do you have synchronous work-performing setup function calls in your application start? Generally for the most part application start should only perform fast function calls (like read x from a file or command line) and launch processes into the supervisor or tasks into a dynamic supervisor.

For the core application it’s all GenServers that run their initialization asynchronously by immediately returning a continue instruction from their init/1 callback (and all the other code that’s potentially calling out to them is set up to properly handle timeouts resulting from this). Based on logging messages, the application start for the core happens within a few hundred milliseconds.

For the bot application, it’s similar in terms of timing so I assume it’s behaving sanely, though I don’t have anywhere near the knowledge of what’s going there that I do for the core application because I haven’t read much of the source code for discord_alchemy.

its strange ,

we wrote all our backend in ets and Mnesia
also ets is even faster than redis , Mnesia is blazing fast when using :mnesia module without any wrapper around it,