Debugging supervision tree

I’ve built a supervision tree and I’m testing it out using observer (via :observer.start()).

I’m finding that performing Process.exit(one_of_the_child_pids, :kill) is unexpectedly also causing the supervisor itself to restart. Assuming that the observer processes GUI is giving me a complete picture of process links, which I assume it must be, I don’t see any way that this could be the case. I’ve experimented with passing silly values to max_restarts in the Supervisor but ultimately I’m stabbing in the dark here.

I’m absolutely certain this is down to some lack of understanding on my part. My question is; what’s the best way of determining why a supervisor has exited and/or debugging this sort of problem? Are there any good debug tools available?

Furthermore, I have pinned down the behaviour to the use of Mongo.watch_collection. Here’s an excerpt:

defmodule CollectionProcessor do

use Flow

def start_link(opts) do
    {topology, opts} = opts |> Keyword.pop!(:topology)
    {collection, opts} = opts |> Keyword.pop!(:collection)
    {pipeline, _opts} = opts |> Keyword.pop(:pipeline, [])

    cursor =
      Mongo.watch_collection(topology, collection, pipeline, nil,
        full_document: "updateLookup"
      )

    cursor
    |> Flow.from_enumerable()
    |> Flow.map(fn doc -> IO.inspect(doc) end)
    |> Flow.start_link()
  end

end

This process is run under a supervisor. If I kill it with Process.exit(collection_importer_pid, :kill) then the supervisor also quits. NB as I’ve said above I have experimented with mad high max_restarts values as far as 10_000 so I don’t really think this can be it.

I tried replacing the Mongo.watch_collection call with a plain infinite stream like this:

cursor = Stream.iterate(0, &(&1 + 1))

Now when I kill the CollectionProcessor process it is restarted and its supervisor is correctly not restarted.

My question is what am I missing? I want the behaviour that I see with the plain Stream.iterate.

Ultimately this boils down to how the library you’re using has specified its supervision tree. Can you inspect its source and show it here?

Apparently that supervisor is not trapping exits but why does it get killed so abruptly, IMO only the source code can show. Likely an explicit request to kill it has been made in a handler.

Thanks @dimitarvp . I concur but I haven’t had a lightbulb moment. Maybe there are some tools / tips / tricks for tracing supervision tree events that I’m not aware of?

Here’s the library in question: elixir-mongodb-driver/mongo.ex at master · zookzook/elixir-mongodb-driver · GitHub

1 Like