Properly stopping a supervision tree, with cleanup

jimm · June 23, 2016, 1:51pm

I have a simple supervision tree with one supervisor and three workers. One
worker manages the GUI (a wrapper around the cecho Erlang library), one
worker is a controller that gets key presses, runs code, and tells the GUI
to update, and the remaining worker does the work.

How do I stop this thing? I want the controller to be able to say “OK, we’re
done” and have each of the workers and the supervisor shut down normally and
call some cleanup code so that my application can exit.

All of the excellent books that I have talk about supervisors keeping
workers running and about workers dying and being restarted, but I can’t
seem to figure out how to have everybody gracefully quit and clean up.

I have tried pretty much every combination of the following that I can think of:

Calling MySupervisor.stop
exit(:normal) and exit(:terminate), from the parent of the supervisor or from the supervisor
Casting a :quit message that returns {:stop, :normal} or {:stop, :terminate} for each of the children (normal gets restarted, terminate complains but
doesn’t stop the parent supervisor)
Having a terminate callback or not having one
Swearing

What do I need to be doing? What am I missing?

aeden · June 23, 2016, 5:24pm

What happens when you call Supervisor.stop(sup) (where sup is a reference to the supervisor, or the name if you have named the supervisor)?

jimm · June 23, 2016, 6:14pm

It works! Thanks, aeden! It took a few more changes to get it working, but
that did it. The remainder of this message describes the “few more changes”:

I had previously tried calling Supervisor.stop with different args to no
avail. When I gave the supervisor a name and called tried
Supervisor.stop(name), I then saw “Application jex exited: normal” but
the top-level app that started the supervisor didn’t quit. I realized that
was because I started the app using --no-halt. I used that because
otherwise my app would quit right after creating the supervisor.

So I tried this:

After starting the supervisor, the top-level app waits for a :quit
message in a receive loop
The quitting code sends that message to the top-level app after calling
Supervisor.stop(name) as you suggested

Bingo! Quitting achieved.

aeden · June 23, 2016, 6:42pm

Excellent! Glad you got it working.

benwilson512 · June 23, 2016, 10:40pm

You can simply call :init.stop from within the controller, and the system will shutdown.

sasajuric · June 24, 2016, 8:17am

+1 for :init.stop! This is a standard approach to politely take down all applications and the entire system, and it doesn’t require any improvisation such custom :quit message. You can then also safely use --no-halt or OTP releases and still be able to stop the entire system.

jimm · June 24, 2016, 12:32pm

Thanks, Ben and Saša. Now I’m off to read the :init.stop docs to see what
if any callbacks are called in my workers.

jimm · June 25, 2016, 12:56pm

A follow-up: :init.stop works much better. I still explicitly send
messages to my workers so they can clean up properly, but then a simple
call to :init.stop works as advertised.

Thank you again, Anthony, Ben and Saša.

benwilson512 · June 25, 2016, 1:24pm

It is my understanding that manually shutting down workers should be unnecessary. If you do the requisite setup required to have the terminate callback run (Process.flag(:trap_exit, true)) and then all you need to do is call :init.stop.

jimm · June 26, 2016, 11:03am

I tried that, but terminate didn’t get called. Not sure what I’m doing
wrong. Here’s what I did:

Called Process.flag(:trap_exit, true) in the init method of my worker
Wrote a terminate method for that worker that let me know it was called
(it wrote both to stdout using IO.puts and to a file using File.write)
- I checked the signature for terminate to make sure I didn’t screw
  it up

That terminate was never called. Again, I must be doing something wrong
but I don’t know what it is.

nicholasjhenry · November 27, 2016, 3:22pm

@jimm Did you manage to get this working?

jimm · November 27, 2016, 9:19pm

Yes, :init.stop did the trick. The controller (listening for keyboard input) responds to the quit key by calling a function in my supervisor that lets everybody clean up then calls :init.stop:

def quit do
  GUI.cleanup
  Metronome.stop
  MIDI.cleanup
  :init.stop
end