Properly stopping a supervision tree, with cleanup

I have a simple supervision tree with one supervisor and three workers. One
worker manages the GUI (a wrapper around the cecho Erlang library), one
worker is a controller that gets key presses, runs code, and tells the GUI
to update, and the remaining worker does the work.

How do I stop this thing? I want the controller to be able to say “OK, we’re
done” and have each of the workers and the supervisor shut down normally and
call some cleanup code so that my application can exit.

All of the excellent books that I have talk about supervisors keeping
workers running and about workers dying and being restarted, but I can’t
seem to figure out how to have everybody gracefully quit and clean up.

I have tried pretty much every combination of the following that I can think of:

  • Calling MySupervisor.stop
  • exit(:normal) and exit(:terminate), from the parent of the supervisor or from the supervisor
  • Casting a :quit message that returns {:stop, :normal} or {:stop, :terminate} for each of the children (normal gets restarted, terminate complains but
    doesn’t stop the parent supervisor)
  • Having a terminate callback or not having one
  • Swearing

What do I need to be doing? What am I missing?

1 Like

What happens when you call Supervisor.stop(sup) (where sup is a reference to the supervisor, or the name if you have named the supervisor)?

It works! Thanks, aeden! It took a few more changes to get it working, but
that did it. The remainder of this message describes the “few more changes”:

I had previously tried calling Supervisor.stop with different args to no
avail. When I gave the supervisor a name and called tried
Supervisor.stop(name), I then saw “Application jex exited: normal” but
the top-level app that started the supervisor didn’t quit. I realized that
was because I started the app using --no-halt. I used that because
otherwise my app would quit right after creating the supervisor.

So I tried this:

  • After starting the supervisor, the top-level app waits for a :quit
    message in a receive loop
  • The quitting code sends that message to the top-level app after calling
    Supervisor.stop(name) as you suggested

Bingo! Quitting achieved.

Excellent! Glad you got it working.

You can simply call :init.stop from within the controller, and the system will shutdown.

4 Likes

+1 for :init.stop! This is a standard approach to politely take down all applications and the entire system, and it doesn’t require any improvisation such custom :quit message. You can then also safely use --no-halt or OTP releases and still be able to stop the entire system.

Thanks, Ben and Saša. Now I’m off to read the :init.stop docs to see what
if any callbacks are called in my workers.

A follow-up: :init.stop works much better. I still explicitly send
messages to my workers so they can clean up properly, but then a simple
call to :init.stop works as advertised.

Thank you again, Anthony, Ben and Saša.

It is my understanding that manually shutting down workers should be unnecessary. If you do the requisite setup required to have the terminate callback run (Process.flag(:trap_exit, true)) and then all you need to do is call :init.stop.

I tried that, but terminate didn’t get called. Not sure what I’m doing
wrong. Here’s what I did:

  • Called Process.flag(:trap_exit, true) in the init method of my worker
  • Wrote a terminate method for that worker that let me know it was called
    (it wrote both to stdout using IO.puts and to a file using File.write)
    • I checked the signature for terminate to make sure I didn’t screw
      it up

That terminate was never called. Again, I must be doing something wrong
but I don’t know what it is.

@jimm Did you manage to get this working?

Yes, :init.stop did the trick. The controller (listening for keyboard input) responds to the quit key by calling a function in my supervisor that lets everybody clean up then calls :init.stop:

def quit do
  GUI.cleanup
  Metronome.stop
  MIDI.cleanup
  :init.stop
end
1 Like