Getting random error: erl_child_setup: failed with error 32 on line 282 (using Port)

Sorry if my question is silly, I am relatively new to Elixir.

I made a small module to handle the execution of external programs. This is my code:

defmodule Abn.Shell do
  @default_timeout 2000
  @default_retry 2
  @default_error_log_file "/tmp/abn_errors/"

  use Abn.Lib
  alias Abn.Log

  ###########################################################################3
  ## Module API

  def run(command, timeout \\ @default_timeout, retry \\ @default_retry) do
    random_filename = if (@log_errors), do: :erlang.unique_integer([:positive]), else: ""
    script = """
    #!/usr/bin/env setsid /bin/bash
    #{command}   
    """
    try do
      port = Port.open({:spawn, script}, [:binary])
      monitor = Port.monitor(port)
      {:os_pid, ospid} = Port.info(port, :os_pid)
      output = get_output({port, monitor}, timeout, ospid)
      kill(ospid)
      Port.demonitor(monitor, [:flush])
      send(port, :close) # just in case...
      if (output == :timeout and retry > 1) do
        run(command, timeout, retry - 1)
      else
        output
      end
    rescue
      e ->
        Log.log(:error, "[SHELL]: #{inspect e}")
        :error
    end
  end

  ###########################################################################3
  ## Private Tools

  defp get_output({ port, monitor }, timeout, ospid, output \\ "") do
    receive do
      {:DOWN, ^monitor, :port, ^port, _} ->
        output
      {^port, {:data, data}} ->
        get_output({ port, monitor }, timeout, ospid, output <> data)
      msg ->
        Log.log(:warning, "[SHELL]: Port #{inspect port}. Breaking loop, unknown 'get_output' message (#{inspect msg})")
        output
      after timeout ->
        Log.log(:warning, "[SHELL]: Process exceed timeout, killing process...")
        :timeout
    end
  end

  defp kill(ospid) do
    System.shell("kill -9 #{ospid}")
  end
 
end

This module allows me to run external processes with a timeout (and n retries) for those cases in which the process takes too long.

The issue is that randomly, every so often I get the error ‘erl_child_setup: failed with error 32 on line 282’ which completely kills the process that invoked Shell.run and the supervisor of that process doesn’t even get an EXIT notification. I have tried to replicate the error but have not been able to.

The context in which this module is used is in a data collector that runs about 30 processes which simultaneously use the Shell module to run external processes.

Versions that I use

Erlang/OTP 27 [erts-15.1]

IEx 1.18.0-dev (a4adaa8) (compiled with Erlang/OTP 27)

Any idea where to start looking for the problem?

2 Likes

I am getting the same error. Did you ever figure this out?

If not … bump.

Well, I’m not sure I can say I discovered the source of the problem, but I did manage to stop it happening again.
If you look at my code you will see that there is a part enclosed in try/rescue. Well, I simply removed the try/rescue (which by the way I’m not sure why I put it there in the first place). From then on it didn’t happen again.

Sorry if I can’t be of more help.

1 Like