Exile - NIF based alternative to ports for running external programs. Provides back-pressure using non-blocking io

akash-akya · May 19, 2020, 3:02pm

Exile is an alternative to beam ports for running external programs. It provides back-pressure using non-blocking io, and tries to fix all issues associated with ports.

Github: https://github.com/akash-akya/exile

Rationale and issues associated with exiting approaches are mentioned here

This is another stab at solving the issues associated with running an external command. My other approach ExCmd uses named pipes for solving these issues. But, since :file functions are blocking operations in the beam, this can cause issues. As of now, there are no non-blocking file operations available in :file module. So the only way to do non-blocking io is using NIF/port-driver.

Exile.stream! is a stream interface for interacting with the external program. It works with both Enumerable and Collectable. Apart from being compostable, stream handles closing stdin and terminating the external process.

Exile.stream!(~w(ffmpeg -i pipe:0 -f mp3 pipe:1), input: File.stream!("music_video.mkv", [], 65536))
|> Stream.into(File.stream!("music.mp3"))
|> Stream.run()

Key differences when compared to other middleware based libraries and port

fixes issues associated with ports
- no more zombie process
- can selectively close stdin
- its is demand-driven (back-pressure)
it uses enif_select for asynchronous io, so it utilizes resources efficiently
uses non-blocking read/write system calls. so it can never block the scheduler
it does not use any middleware
- no additional os process. no performance/resource cost
- no need to install any external command
can run many external programs in parallel without adversely affecting schedulers (when compared to ExCmd)
stream abstraction for interacting with the external program
should be portable across POSIX compliant operating systems (not tested)

Non-blocking io can be used for other interesting things. Such as reading named pipe (FIFO) files. Exile.stream!(~w(cat data.pipe)) does not block schedulers so you can open hundreds of FIFO files unlike default :file module.

Please check the project page for more detail.

Note: Exile is experimental and it is still work-in-progress

feedback is welcome

dimitarvp · May 19, 2020, 3:11pm

Very, very nice. Love it.

Just one note, might be misinformed: I am worried that with a C dependency the crate might not compile at all on Windows – have you tried it? Or are you targeting UNIX-es only?

That’s the reason I am only doing NIFs with Rust – got impressed in the past how quickly and easily it compiles stuff on Windows without complaints.

akash-akya · May 19, 2020, 3:23pm

Very, very nice. Love it.

Thank you

C dependency the crate might not compile at all on Windows – have you tried it? Or are you targeting UNIX-es only?

It uses the POSIX API. so currently, it is for UNIX like systems only.

We can have different implementation for windows, but I don’t have a windows machine to test it properly.

I like to use Rust, but I think this case different. Most of the code associated with system calls, so using C is straight-forward and it can be used with any POSIX complaint os without relying on rust compiler presence.

dimitarvp · May 19, 2020, 4:02pm

I see, thank you.

jayjun · May 19, 2020, 6:36pm

Great stuff, need more explorations in this area! Two questions,

Forking in the VM briefly causes a memory explosion, that’s why erl_child_setup exists to spawn ports. Have you done a comparison?
How do you prevent misbehaving programs from becoming orphans? Misbehaving as in those that don’t exit after standard input is closed.

akash-akya · May 19, 2020, 7:43pm

Thank you

Forking in the VM briefly causes a memory explosion , that’s why erl_child_setup exists to spawn ports. Have you done a comparison?

I thought of this approach before. But in most modern os fork() is copy-on-write. so memory explosion is not an issue (unless I’m missing something). But keeping separate processes for exec has other advantages which are mentioned here. This can be done, but I think the current approach is much simpler. And this process will be like a “shared” resource among all schedulers and I prefer to avoid that. also, we might have to monitor this os process and deal with all the concern that comes with it. If we really need it, we can add it in the future I think.

How do you prevent misbehaving programs from becoming orphans? Misbehaving as in those that don’t exit after standard input is closed.

The executed external program is tied to a beam process, and this beam process will be monitored by another a “watcher” beam process. Watcher makes sure to terminate the spawned program when the beam process dies for whatever reason. It does in the order: polite close stdin → SIGTERM → SIGKILL.

I would like to know if there are better approaches than this

jayjun · May 20, 2020, 11:19am

The forker spawn driver was added fairly recently in ERTS 8.0 (search for erl_child_setup) because of 3-5x better performance, even 10x. This is way after Linux implemented copy-on-write.

So I suspect the memory explosion comment still refers to modern kernels. I found a good read.

fork() is evil; vfork() is goodness; afork() would be better; clone() is stupid

But even COW is very expensive because it requires modifying memory mappings, taking expensive page faults, and so on. Modern kernels tend to seed the child with a copy of the parent’s resident set, but if the parent has a large memory footprint (e.g., is a JVM), then the RSS will be huge.

Regardless, now that you’ve brought fork() back into the VM, I feel it’s a great opportunity to do a proper comparison. Judging from the above, the difference isn’t trivial.

The executed external program is tied to a beam process …

The scenario I imagined is when the Erlang node crashed or is forcefully killed. An external shim can detect when that happens then clean up by killing all child processes before exit.

mbklein · May 21, 2020, 12:53am

This looks very cool, and I’m trying to convert some code that’s currently using ports to use Exile instead, but I’ve having some trouble. The code is in JavaScript, and I’m having trouble trying to get it to handle the input stream. I’ve managed to reduce it to a super-simple test case that just copies stdin to stdout:

Here’s the JS:

#!/usr/bin/env node

process.stdin.on("end", () => console.error("done"));
console.error("piping");
process.stdin.pipe(process.stdout);

And here’s the Elixir:

defmodule ExileTest do
  def hello do
    [Path.expand("priv/js/command.js")]
    |> Exile.stream!(input_stream: File.stream!("priv/input.txt", [], 65536), stderr_to_console: true)
    |> Enum.to_list
  end
end

If I run it from the command line…

$ cat priv/input.txt| priv/js/command.js
piping
Exile is an alternative to beam ports
for running external programs. It provides
back-pressure using non-blocking io, and
tries to fix all issues associated with ports.
done

But if I try to call it from iex…

Erlang/OTP 22 [erts-10.6.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> ExileTest.hello()
piping

…and it sits there blocking forever until I ^C my way out.

I’ve tried a bunch of different ways to read input on the JS side: process.stdin.read(), process.stdin.on('data', ...).on('end', ...) handlers, fs.readFileSync(0), and whatever else I could think of. Exile never seems to send it any data.

Any thoughts? Would you prefer I open an issue in the GitHub repo instead?

akash-akya · May 21, 2020, 5:56am

The forker spawn driver was added fairly recently in ERTS 8.0 (search for erl_child_setup ) because of 3-5x better performance, even 10x . This is way after Linux implemented copy-on-write.

Ok, I was thinking about the memory overhead. But its CPU overhead due to memory pages update.

I did basic benchmark on mac. forking from beam process is indeed around 3 times slower than erl_child_setup (sometimes 4.5 times). Note that it is still ~1000 forks per second. practically speaking, I’m not sure if raw forking speed is important, because we usually hit other system limits before we start to operate at this rate. Nevertheless, this is nice to have, thanks for pointing it out, I’ll look into improving this. Here are the results

The scenario I imagined is when the Erlang node crashed or is forcefully killed. An external shim can detect when that happens then clean up by killing all child processes before exit.

Yes, exile does not kill processes if beam crashes or if the beam is killed with SIGKILL. This is similar to how all shells and other languages I checked behave. Because any kind of fix for this involves spawning another “watcher” os process (one or many). This is unnecessary overhead for well-behaved programs. Moreover, the user can fix this themselves by running a script if required.

On the same note, other kill signals can be handled (such as SIGTERM). WIP changes.

akash-akya · May 21, 2020, 6:01am

Exile.stream!(input_stream: File.stream!("priv/input.txt", [], 65536), stderr_to_console: true)

Hi, the field name is input not input_stream. Exile is ignoring input_stream param. It’s hanging because there is no input.

Anyway, exile should fail with proper error message for an invalid option. Thanks for pointing it out. I’ll fix this

jayjun · May 21, 2020, 9:25am

Thanks for investigating! 1,000 forks per second is indeed a lot of headroom.

This is unnecessary overhead for well-behaved programs.

Trade off is perfectly understandable. I specifically have trouble with this claim,

Misbehaving programs are the ones that become zombies (see Port docs). Exile is no different from a plain port in this respect.

On the other hand, a port-based library can double as the kill-on-exit script, thus guarantee no zombies. Middleware solutions are superior here, instead of comparing with “all shells and other languages”.

akash-akya · May 21, 2020, 10:22am

Misbehaving programs are the ones that become zombies (see Port docs). Exile is no different from a plain port in this respect.

The major issue with the port is how it handles misbehaving external processes when the beam vm is running not when it crashes or killed. As an example consider spawning sleep 100000 and closing port. Most of the languages handle this scenario properly unlike elixir/erlang. Exile tries to fix this.

On the other hand, a port-based library can double as the kill-on-exit script,

Yes. I do agree that port-based lib handles SIGKILL or crash of the VM. Exile does not handle this, if one actually needs this they can just spawn another watcher script which I already mentioned.

thus guarantee no zombies.

This might sound pedantic, but this is an important distinction: No, It can not guarantee no zombies. Any solution to the cleanup spawned process is only going to be best-effort solutions. For example, someone can kill the middleware program with SIGKILL, middleware can crash (I agree that it’s unlikely), what if the program we spawn starts another process? it’s not hard to create a program that can not be killed by making it spawn itself.

jayjun · May 21, 2020, 11:32am

I see, sorry I understand the claim now.

I don’t mean a hard guarantee. To a middleware solution, closing the port and the VM crashing looks the same (standard input closed) signalling it to kill the child, thus no zombies either way, without additional watcher scripts. I guess cleanup-even-if-the-BEAM-dies is what I meant.

Anyway, all clarified. Thanks for the discussion.

mbklein · May 21, 2020, 2:45pm

Hi, the field name is input not input_stream . Exile is ignoring input_stream param. It’s hanging because there is no input.

HOW DID I DO THAT. I think I had it right in my original code, but I’ll update my test case and go from there.

mbklein · May 21, 2020, 5:43pm

It turns out I did have the input parameter correct in my original code, but fixing it in the test code allowed me to figure out the rest and I’ve got it working now. The dumb typos are always the most frustrating to find and fix.

Thanks for your help. I look forward to following your progress.

zacky1972 · May 28, 2020, 5:58am

I wrote an issue of Pelemay to evaluate potential of application of Exile to it:

Thank you for your information!

zacky1972 · May 28, 2020, 6:31am

Hi, Akash Hiremath,

I’m a co-author of Pelemay. I favor your Exile! I’d like to discuss you on future and interoperability of FFIs of Erlang and Elixir.
Please contact me: https://github.com/zacky1972/

akash-akya · May 28, 2020, 12:47pm

Hi,
thanks for informing. Glad to know

I’d like to discuss you on future and interoperability of FFIs of Erlang and Elixir.

Sure

Please note that I’m planning on a major internal change. Public interface (stream) should remain same though

elcritch · August 4, 2020, 12:09am

Looks like a handy library! On Nerves muontrap is used to handle cleaning up various zombie processes. To do this it seems muontrap can use some Linux cgroup magic to kill sub-processes of a zombie. Not sure how it’s done but thought you might find the technique useful for the zombie sub-child issue.

akash-akya · August 8, 2020, 5:01pm

Thanks for the pointer. That’s true, we can use cgroup for that among its other benefits. I think It is sort-of possible without cgroup too (don’t know for sure). But imo actual benefit of this is limited when compared to other things I like to have, such as making fork faster. Also, this is Linux specific.
I’ll revisit this after sometime.