How beam handle disk operation

In Golang or Rust-Tokio for avoid blocking operation in scheduler, spawn a thread that handle io operation .

But how beam handle io operation ??

1 Like

Within a process, or with the Task module. You can do it synchronously in a process as the preemptive scheduler is going to switch between processes, i.e. you won’t block other processes while you load the file or whatnot.

1 Like

Unless you use raw IO, your “file descriptor” or “socket” is actually a GenServer that wraps access to that file. Even if you use IO in raw mode, the virtual machine performs all of the operations in specially designed preemptible internal machinery, so you don’t have to worry about blocking the rest of the VM.

2 Likes

As @ityonemo said, there’s two ways of doing this in the BEAM:

  1. The normal way (example: don’t supply special flags when opening a file) which serializes I/O access through a singular GenServer. That’s obviously not good for speed if you really want to ingest data from the filesystem at the speed of your underlying HDD or SSD but it gives you safety; if something goes wrong on a syscall level then at least you won’t lose the entire BEAM OS process.

  2. The “raw I/O” way (check the “raw” flag here: Erlang -- file) which gives you a classic way of doing I/O, meaning that every Elixir process (which is NOT the same as OS process; I think you already know that) does it’s own I/O without going through a central GenServer but carries the risk of bringing down the entire BEAM OS process / node if something goes wrong (whatever that might be).

I personally go for #2 because to me failing to read from disk is a critical condition so I don’t have to optimize around catching the problem. Usually the underlying Linux distro logs the actual error so I just inspect system logs and see what I can do to fix the problem. Only happened once ever – because a library needed to create temporary files and there was no space left on the virtual SSD.

1 Like

File operations don’t block normal schedulers because they’re implemented as dirty NIFs, which run on dedicated “dirty” scheduler threads. The thread pool size is configurable: Erlang -- erl

4 Likes

Note that it is server process per file and NOT one process for all files. And it is as @dom mentioned that file operations are automatically passed by the system to special “dirty” schedulers for doing file operations. Your Erlang scheduler will not block while waiting for the result of the file operation, it will suspend that Erlang process and go and run other Erlang processes in the mean time. When the file operation has completed than that Erlang process will be rescheduled and can continue.

An enormous amount of work has been done in the BEAM internals to make sure it will by default not block. You have to purposely explicitly fix things to get it to block. And simple basic file operations won’t block things.

9 Likes

Oops, I realize I haven’t worded my reply well. Thanks for the elaboration.