Issues with opening port for bash script and automatically closing the process

amnu3387 · November 9, 2017, 10:14pm

Hey, perhaps someone ran into this before and can help me out. Basically I’m opening a port that runs a bash script for launching chrome. It works fine on my local mac, chrome gets started, when the Port is sent Process.exit(pid, :normal), the chrome instance is killed.

Now I’ve deployed this umbrella app to ubuntu, it’s working, but chrome instances are not being killed when I .exit() the process. Any ideas?

#!/bin/sh
google-chrome --attrs --flag-switches-begin --headless --disable-gpu --remote-debugging-port="$1" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" --user-data-dir=/dev/null --disable-extensions --disable-internal-flash --disable-bundled-ppapi-flash --incognito --ignore-certificate-errors --noerrdialogs --enable-internal-media-session --flag-switches-end & pid=$!
while read line ; do
  :
done
kill -KILL $pid

And what calls that script:

def handle_cast({:start_session, reference, %{"pageUrl" => page_url}}, %{session: false} = state) do
    IO.puts("Hound :start_session for reference: #{inspect page_url}")
    port = find_available_port()
    path = Path.absname(chrome_path(), Application.app_dir(:app_chrome))
    chrome = Port.open({:spawn_executable, path},
      [:binary, :use_stdio, :stderr_to_stdout, :exit_status, args: ["#{port}"]])
    full_page_url = base_url() <> page_url
    {:os_pid, pid} = Port.info(chrome, :os_pid)
    case Regex.named_captures(~r/listening\son\s?(?<url>.*)\s/, handle_output(chrome)) do
       %{"url" => url} ->
        GenServer.cast(ChannelerServer, {:start_socket, %{port: port, reference: reference, page_url: full_page_url}})
        GenServer.call(HoundKeeper, {:update, reference, chrome, pid, port, url}, 60_000)

      _ -> {:error, "Not Active"}
    end
    {:noreply, state}
  end

Ubuntu is 16.04
Erlang/OTP 20 [erts-9.1] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.5.2

Any ideas? Thanks

OvermindDL1 · November 9, 2017, 10:17pm

From my testing when headless was in beta is that chrome can still attach to another running session if it is called, so calling that could fork out to something the script no longer sees. From what I recall, if you want to safely shutdown a chrome headless instance then you need to tell it to do so itself. This could all be wrong now that it is released now though. ^.^;

amnu3387 · November 9, 2017, 10:39pm

I’m not entirely sure I’m following LOL
So say I want to kill an OS process from elixir, is there any system command for that or I’m better off running another shell script that just executes kill -9 "$1" ?

I just thought it would have the same behaviour as in macosx (perhaps a bit naively)

OvermindDL1 · November 9, 2017, 10:52pm

The problem is not killing off an OS process, kill is just fine for that, it is that chrome often forks itself to new PID’s so the one you tried to kill actually does not belong to chrome any longer (it might even belong to something else).

So yes, for a generic program that script is fine, however chrome is not a generic program, most specifically when running with --remote-debugging-port=... as it will try to stay persistant. Thus to stop chrome properly you should not kill it (as it may often not work due to it forking away into new processes), but rather you need to send the ‘close’ signal to that port that you set up via --remote-debugging-port argument.

https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md

Honestly you should probably write either a node script using the javascript API that google made to interact with it, or write a C++ program and use the C++ library that google made to interact with it, whichever you use ‘that’ is what you’d call from a Port in elixir. Yes chrome is weird (it should just handle it on stdin/stdout to be honest), but you have to use it’s API that you exposed at that port to properly shut it down, killing chrome does not always work as you see.

amnu3387 · November 9, 2017, 11:02pm

Yes, again my misunderstanding stems from the fact that in macosx killing the process that originated the debbuger will also close the main chrome instance and the debugger with no further forking. I’m being a bit lazy in reality, but if I could get away with it closing “automagically” I would lol! I’m not sure how using the node library helps, since it’s just a wrapper for the devtools protocol? Thanks either way, I’m gonna shut it down manually

OvermindDL1 · November 9, 2017, 11:08pm

Same in linux, however it is ‘chrome’ itself that likes to fork around depending on what other instances of it are running and so forth.

Yep, that’s all it does too. If you are already talking to the chrome debugger over it’s port, just order it to shut down.

You could do it from your shell script too, either a node command or just perl/curl/whatever the command over that port. Or even scan for the PID that is on that port and kill ‘that’ pid. ^.^

keathley · November 10, 2017, 10:41am

Wallaby manages all of this for you (no need for node or C++). You can look at what we did and port it to your hound setup.

amnu3387 · November 10, 2017, 12:29pm

Hi keathley, I did try wallaby and hound, I ended up not using either because I was unable to keep multiple sessions alive (say 20 sessions sitting for an arbitrary number of minutes on a page), with individual devtools debugger for each session (so 20 different pages, independent sessions, each one with an individual debugger attached for anywhere between 5min to 15min, sometimes more)? If you know that that can be achieved with wallaby, running chrome, I’m all to use what you guys already did.

I will indeed take a look at how you got it working, I was thinking “Browser.close” through the wire would suffice, but “devtools” is telling me: %{"error" => %{"code" => -32601, "message" => "'Browser.close' wasn't found"}, "id" => 999}
Thanks

amnu3387 · November 20, 2017, 8:13pm

Sharing what I found while researching and solving my issues with this, In case anybody needs some solutions for dealing with misbehaving ports/processes outside the beam - it’s a bit hacky, but hey, gets the work done on linux ubuntu and chromium instances.

First, use “kill” on the bash script itself, then have a function that aggregates recursively all the PIDS, kill them, checks the cleanup and as last resort “kill -9” them. kill -9 on chromium processes on linux is a bad idea, although sometimes it won’t help to just do the standard kill, as the processes will remain running, so you’re basically left with having to make sure you catch them all. In my particular case I can’t pgrep/grep on the process name, as I need to kill specific instances of chromium at certain times, and not all running instances of chromium.

For the bash script

commandline blablablabla &
pid=$!
while read line ; do
  :
done
ps -ef | grep $pid | grep -v grep | awk '{print $2}' | xargs kill
kill -KILL $pid

This will grep all processes that mention the &background command PID (so the main process itself and any other forked processes that reference it as the Parent PID). Sends them a kill signal, and a -9 to the main process alone. Xargs is parallel, so all processes grepped there will receive a regular kill before the parent receives a -9.

This should work most of the time, but chromium sometimes is nasty. So I added a new layer at the application level. Since I always pass an arbitrary non-repeated debugger port to the chromium instance I can use that to identify the processes I’m interested in finding. You can use whatever, and even pass non-existing flags to identify your chromium instances, since they keep it as part of the “command” attribute that was ran to create the process:

Task.start(fn -> try_port_kill(ws_port) end)

def try_port_kill(port) do
    pids = "ps aux | grep -ie port=#{port} | grep -v grep | awk '{print $2}'" |> String.to_charlist |> :os.cmd |> List.to_string |> String.split
    port_kill(pids)
  end

  def port_kill([]), do: :ok
  def port_kill([h|t]) do
    pids = "ps -ef | grep #{h} | grep -v grep | awk '{print $2}'" |> String.to_charlist |> :os.cmd |> List.to_string |> String.split
    
    Task.start(fn -> port_kill(pids -- [h]) end)
    System.cmd("kill", ["#{h}"])
    Task.start(fn -> brutal_kill(h) end)
    
    port_kill(t)
  end

  def brutal_kill(h) do
    case "kill -0 #{h}" |> String.to_charlist |> :os.cmd |> List.to_string do
      "" ->
        :timer.sleep 5_000
        System.cmd("kill", ["-9", "#{h}"])
      _ ->
        :ok
    end
  end

This doesn’t rely on the PID from the bashscript - I’m also grepping on something I’m sure is unique in my use case “port=an_arbitrary_port_that_I_had_set” can only match for what I want so I’m fine with the hackiness of it. You could also start chromium with a flag such as --my-marker-flag=something and then grepping on this. I call this method after I “close” the port.

Lastly since it involves long-lived, always running processes and I don’t want to risk having any of them creep, and even although it seems what I’ve added before is enough to keep it tamed and under control, I went a bit further and added a sweeper based on time that will kill any chromium/chrome processes running more than 40min, and I recurrently send_after to call this:

def sweeper do
    {all, _} = System.cmd("ps", ["-eo", "pid,lstart,comm"])
    [header | proc_lines] = String.split(all, "\n")
    Enum.each(proc_lines, fn(proc) -> sweep_zombie(proc) end)
    {:noreply, state}
end
def sweep_zombie(proc_line) do
    split = String.split(proc_line)
    {[pid, _wday, month, day, hhmmss, year], all_comm} = Enum.split(split, 6)
    comm = Enum.join(all_comm, " ")
    
   case Regex.match?(~r/chrom/, comm) || Regex.match?(~r/Chrome/, comm) do
     true ->
       {:ok, datetime, 0} = DateTime.from_iso8601("#{year}-#{n_month(month)}-#{String.pad_leading(day, 2, "0")}T#{hhmmss}Z")
      dt = DateTime.to_unix(datetime)
    tn = System.system_time(:second)
    case (tn - dt) > (40*60) do
      true ->
        IO.puts("DT: #{inspect dt} TN: #{inspect tn} COMM: #{inspect comm}")
        System.cmd("kill", ["-9", "#{pid}"])  
      false ->
        :ok
    end
  false ->
    :ok
  end
end

def n_month("Jan"), do: "01"
def n_month("Feb"), do: "02"
def n_month("Mar"), do: "03"
def n_month("Apr"), do: "04"
def n_month("May"), do: "05"
def n_month("Jun"), do: "06"
def n_month("Jul"), do: "07"
def n_month("Aug"), do: "08"
def n_month("Sep"), do: "09"
def n_month("Oct"), do: "10"
def n_month("Nov"), do: "11"
def n_month("Dec"), do: "12"