Port communication (with escript) does not work in (distillery) release version

I have an application that needs to use an Erlang port for communicating with a legacy API. I send Erlang terms over the pipe, decode them in my mediator C application, and call into the legacy API.

I now added a distillery release because I want to deploy the application. But nothing happens. The port application is started but even the init command I send through does nothing. Zilch. My whole application dies later because there is a port transaction that times out because of the GenServer standard timeout (5s!).

This has worked for a long time now. When I run my application with mix run --no-halt all is well - my external application is called and communication happens. I’m using an escript to emulate the external program in my test harness, and this escript sends me some default responses. In this setup everything works, but with the release version from distillery the port just stalls.

Any ideas?

I opened an issue for this and got this answer from José Valim:

@DerKastellan if you see the output that you have added of File.cwd you will notice that the current working directory is different. The release runs inside the release directory. Any file that you need in production needs to be in the priv directory. So you should move the file to “priv/test.esh” and then locate it using Application.app_dir(:name_of_your_app, “priv/test.esh”).

I will have to check if that solves my problem. Not sure, since my problem is not that the application is not started - I can verify that it actually is. On my machine, I see it print an initial log but then no further piped communication takes place.

Adding this as additional cross-reference as to where to put files to be delivered with release: Including data files in a Distillery release

José closed the issue, but now I have pointers for further investigations.

See also here: https://github.com/elixir-lang/elixir/issues/6166

Could be an oddity with starting the escript or an issue of the underlying Erlang. Best guesses for now.

I took José Valim’s pointers and rewrote my host test application that emulates my actual Erlang port application in C. This eliminates the problem.

José’s original finding was that when the escript interpreter delivered with the distilled package including the ERTS was deleted, example worked. The issue seems to be solely to occur with escripts.

It kind of qualifies as a solution…

After revisiting some code for a university project, I remembered that I had written code in erlang for it, which I delivered as escript. This escripts also had trouble to receive input from stdin, I was able to solve this problem by simply waiting a bit before sending data. I think this is because stdin is consumed by ERTS (or some unknown process) up until the time that your applications process takes over. Depending on how much OTP apps were inlcuded into the escript, and needed to start before mine, I needed to wait 100ms up to 2 seconds.

Perhaps your problem is related to this starting gap? Are you able to check?

I published my test harness here: https://github.com/DerKastellan/release-weirdness

Want to try?