Graceful Shutdown of GenServers

sheharyarn · September 29, 2016, 10:25am

I writing an Elixir app with GenServer that starts an external application on boot and shuts it down and does other clean-up on exit. I’ve added bootup functionality in the init/1 callback and cleanup code in the terminate/2 callback.

The init code works fine when the GenServer is started, and the terminate method is also called when the :stop signal is manually sent, but in the cases of unexpected shutdowns and interrupts (as in the case of hitting Ctrl+C) in IEx, the terminate code is not called.

What’s the proper way of doing cleanup when my Elixir app crashes or is unexpectedly shutdown?

Here’s the parallel StackOverflow Question and my code:

defmodule MyAwesomeApp do
  use GenServer

  def start do
    GenServer.start_link(__MODULE__, nil)
  end

  def init(state) do
    # Do Bootup stuff

    IO.puts "Starting: #{inspect(state)}"
    {:ok, state}
  end

  def terminate(reason, state) do
    # Do Shutdown Stuff

    IO.puts "Going Down: #{inspect(state)}"
    :normal
  end
end

MyAwesomeApp.start

OvermindDL1 · September 29, 2016, 2:08pm

The ‘proper’ way to shut down a node from the console (or in code) would be :init.stop(), which can take an optional integer that is the return code of the program. Ctrl+c is a harsh stop (I wish iex would bind it to :init.stop() with, say, a 30 second timeout that counts down on the screen, maybe displaying what it is waiting on to top too).

sheharyarn · September 29, 2016, 2:11pm

So there’s currently no way to catch Ctrl+C (and other sudden) exits?

sasajuric · September 29, 2016, 2:27pm

To increase chances of the terminate callback being invoked, the server process should trap exits. However, even with that, the callback might not be invoked in some situations (e.g. when the process is brutally killed, or when it crashes itself). For more details see here.

As mentioned, if you want to politely shutdown your system, you should invoke :init.stop, which will recursively shutdown the supervision tree causing terminate callbacks to be invoked.

As you noticed, there is no way of catching abrupt BEAM OS process exits from within. It’s a self-defining property: the BEAM process terminates suddenly, so it can’t run any code (since it terminated) Hence, if BEAM is brutally terminated, the callback will not be invoked.

If you unconditionally want to do something when BEAM dies, you need to detect this from another OS process. I’m not sure what’s your exact use case, but assuming you have some strong needs for this, then running another BEAM node, on the same (or another) machine, could work here. Then you could have one process on one node monitoring another process on another node, so you can react even if BEAM is brutally killed.

However, your life will be simpler if you don’t need to unconditionally run some cleanup logic, so consider whether the code in terminate is a must, or rather a nice-to-have?

sheharyarn · September 29, 2016, 2:33pm

That’s a pretty detailed answer that covers most of my concerns. Would you consider also posting this as an answer on my StackOverflow Question - for future reference?

sasajuric · September 29, 2016, 4:33pm

Sure, I copy-pasted it there.

pdawczak · April 2, 2017, 1:53pm

That’s absolutely great answer - thanks @sasajuric. It confirms this unfortunate situation - what approach would you suggest then for handling ports closure upon killing iex?

I’m opening port to boot executable up when elixir application is started (https://ngrok.com/ to be more precise). This is used for development purposes, so surely, the way you use this dev-env is, if you need to reboot the app - you’re killing it (Ctrl-C) and iex -S mix again, but this crashes as the executable has not been killed along the way.

The way I was trying to handle that, was to send kill -9 to process from terminate, but this is not invoked…

Thanks for your advice!

pma · April 2, 2017, 2:05pm

If the Erlang VM stops, the external port process will receive a signal indicating that the stdin closed.

If ngrok doesn’t gracefully stop on CTRL+D, you can use a small bash wrapper to handle the stdin closing and then killing ngrok with the appropriate signal.

pdawczak · April 2, 2017, 2:43pm

Thanks @pma!

Yes, indeed - ngrok (when watching: watch -n 1 "ps ax | grep ngrok") seems to disappear after some delay after killing iex. Unfortunately, it is not instantaneous and it’s indeterministic - it’s alive from couple sec to up to 30 sec, and just then disappears.

This is a problem, because if you’re trying to kill (eg. Phoenix dev server) and start again shortly after - everything crashes, as new ngrok can’t be opened (unless the previous one dies quickly enough before attempting to open a fresh one).

I also tried a little wrapper script (like the one described here). This, in fact, kills ngrok along exiting iex, but it suffers another problem - when ngrok crashes (for whatever reason, I’ve tried to mimic that by killing - kill -9 the ngrok process), this doesn’t propagate up via port, as the pid of running process points to the wrapping script.

sasajuric · April 2, 2017, 3:20pm

If the Erlang VM stops, the external port process will receive a signal indicating that the stdin closed.

This is the correct hint. The external process will get EOF on its stdin. However, if the external program is busy doing something else, it could linger for much longer before it detects that. I’ve written a bit about this here (see “Program Termination” section).

IMO, the cleanest solution, if you own the code of the external program, is to adapt it to run processing in a separate thread, while the main thread just does I/O. That way, the external program can immediately detect the termination of the other side, and terminate itself immediately.

If that’s not an option, I think (but not sure) that Porcelain by @alco might offer some automagical help.

pdawczak · April 6, 2017, 8:20pm

Thank you @sasajuric, this is very valuable answer!

I’ve read through your article earlier, when I was starting playing with ports and it brought a lot covering basics. Great post!

You’ve stated:

It’s worth noting again, that a port is closed when the owner process terminates

This is the bit I wasn’t sure - because iex session is killed (this is what happens with Ctrl-C + Ctrl-C, right?) does Elixir have time to send EOF?

Unfortunately, ngrok is application I don’t own and I had suspicion what happens, but you’ve brought final confirmation. As mentioned earlier, wrapping with script was causing problem of not propagating it stopped and as such, it was difficult (impossible?) to detect it and supervise - that was the reasoning I’ve decided to go without wrapping script.

Thank you for your input! I truly appreciate it!

sasajuric · April 6, 2017, 9:48pm

I’m not really familiar with details, but I’m pretty certain Elixir (or rather BEAM) doesn’t need to send EOF at all. I assume that BEAM OS process owns its end of the pipes, so regardless of how BEAM OS process terminates, all its resources are closed, and the external program gets EOF when it attempts a read from its stdin. For this to happen instantaneously, the external program needs to constantly read from the pipe. If the program is busy with some other processing, it might take a long time to notice EOF. I’ve seen this situation in practice.

Ports are meant to be used with external programs which are written for to be used by them. If the program you’re using is not written for Erlang ports, you need some kind of an adapter which can handle all requirements of Erlang ports. That also means that your wrapper needs to detect the crash of the wrapped external program, and react to it by stopping itself, which should then be detectable on Elixir side of things. I never did this myself, so not sure how it can be done, but I’d be surprised if this wasn’t possible.