Consume the output of external program at a process' own pace

mguimas · October 10, 2019, 2:02pm

How to make a process consume at its own pace the output of an external program without risking the latter from overloading the message queue of the former?

This is something that ports seem to not be able to handle because they will flood the connected process message queue with the data read from the external program. That is, I could not find a way for the process connected to a part to demand more data from the external program at its own pace.

What solutions do you know regarding this?

Thanks

entone · October 10, 2019, 2:52pm

Broadway or GenStage would probably be your best bet. You would need to create a consumer for your external process.

entone · October 10, 2019, 2:53pm

In the end though, if you can’t process messages fast enough, some process’ message queue is going to get overloaded.

mguimas · October 10, 2019, 3:19pm

That’s right, but I would prefer for the “queue” (or whatever) at the external program side to overload and crash the external program, than to overload and crash at the BEAM side.

Fl4m3Ph03n1x · October 10, 2019, 3:20pm

You are looking into a back-pressure mechanism. Mainly, the consumer consumes at its own pace and forces the producer to wait. The most basic thing you can do with this is having a GenServer that only has calls instead of casts.

Because the client (code that invokes the GenServer) is blocked when the GenServer is working, it forces all clients to wait until it has finished.

Back-pressure also has its issues though, if the client waits for too long, it may decide to cancel the request. But this is something that you must decide on your own.

mguimas · October 10, 2019, 3:27pm

I agree that I am looking for back-pressure mechanism.

What I did not get is why you brought GenServer into this discussion …

What I want is to have an Erlang/Elixir lightweight process read data on demand from the stdout of a program external to the BEAM VM.

Please, clarify.

Thanks

mguimas · October 10, 2019, 3:43pm

I know that back-pressure can be controlled between lightweight BEAM processes, but my case is between a consumer BEAM process and a producer OS process.

Do you have any pointers for examples using Broadway or GenStage doing on-demand consumption of data from an OS process? It have not yet found any using Google …

benwilson512 · October 10, 2019, 4:36pm

The OS process is going to need to cooperate with this, in the say way that processes coordinate sending data over TCP. The OS process needs to accept some message which says “Give me X bytes / messages / whatever”, and then it sends only that many until you ask for more. If the OS process is written to just send data your way without any cooperation you’ll need to manage it.

mguimas · October 10, 2019, 4:40pm

Yeah, that’s probably the quickest way to make it work, like opening a port to a Bash script, which reads the requests from Erlang for X amount of data, and then, the script gets such amount from the final OS process.

I believe this can be made by having the Bash script reading from a Unix pipe, which the OS process writes too and blocks when the pipe is full.

The other way I see is to have a NIF that reads from a pipe, without blocking on the pipe to avoid blocking a BEAM scheduler.

Fl4m3Ph03n1x · October 11, 2019, 9:12am

My initial assumption was to have an Elixir process (GenServer) get the information needed from the OS process and then relay it to a worker. In this scenario you could implement back pressure via call.

However, @benwilson512 may actually serve you better. To communicate directly with OS processes we made use of Port (made the OS process open a port to our elixir app).

https://hexdocs.pm/elixir/Port.html

I hope this is useful to you. I advise reading the zombie processes section, it was of special importance to our projects.

dimitarvp · April 25, 2020, 10:58am

@akash-akya lately released a library for that, but at the moment it expects both stdin and stdout to be utilised, which makes it a little awkward to use for when you e.g. simply want to consume the output of the find program. Additionally, it has an external Golang tool dependency which is not ideal.

He said he will add option to only consume stdout.