Ensuring that all linked processes exit

Hi, let’s say I have a GenServer that I start using start_link under a supervisor. This GenServer has some helper processes that it also starts with start_link, but not under any supervisor, for example by just doing Task.start_link. How can I make sure, that when the GenServer crashes, all the helper processes also exit before the GenServer is restarted? Especially given that the helper processes may trap exits and spawn their own helpers, thus it may take some time until they exit.

Another question is, should I make sure? Having this guarantee I could avoid some problems when operating on global resources. For example, by starting the GenServer in tests with ExUnit.start_supervised I can be sure it exits before a subsequent test starts, making the tests isolated. But when a helper process doesn’t exit fast enough, they aren’t really isolated.
Or, consider a case when a helper process listens on some port. If it’s started again before being shut down, the port will be still occupied.
On the other hand, these issues can be solved within the helper processes themselves.

1 Like

You could take a look at GitHub - sasa1977/parent: Custom parenting of processes in Elixir for that if you want a “less lines of code solution”. I’d personally consider making this gen_server a unit of more processes, with supervisors as well as workers.

4 Likes

Long story short: with the way you currently have it setup you very likely cannot, or at least not easily without replicating parts of what a Supervisor does.

Your best bet is probably to instead start a Supervisor of it’s own which starts the GenServer in question and any other helper processes - such as a Task.Supervisor used by the GenServer - and a restart strategy of :one_for_all or :rest_for_all depending on your semantics.

Basically a supervision tree like this:

Top Supervisor [one-for-one]
|-- GenServer 1 Supervisor [one-for-all/rest-for-all]
|   |-- GenServer 1
|   |-- Task.Supervisor 1
|   |-- ... other helper processes ...
|
|-- GenServer 2 Supervisor [one-for-all/rest-for-all]
|   |-- GenServer 2
|   |-- Task.Supervisor 2
|   |-- ... other helper processes ...
...

This way you can rely on OTP semantics to restart your helper processes instead of having to build something of your own.

4 Likes

Yeah, I thought of it, it would be ok if the GenServer was used in my app only, but since it’s not, that would require anyone who uses it to build that supervision tree. Or I could wrap it somehow, but I have no idea how. I mean, till now everyone who uses my GenServer just puts it under a supervisor and it works. In tests, they do start_supervised and it works. Now, when I delegate some work to a separate task, I’d like it to work the same way, but I don’t see how, even if I replicated some supervisor behaviour. Is that approach somehow extraordinary / not in line with the OTP way?

Do you mean the way @wolf4earth proposed?

Yeah.

To those people it should be mostly transparent if the child_spec of the module starts a single process or a whole tree of them. E.g. the Ecto.Repo most people have in their apps spawns a supervisor on top with about a handful or processes beneigh.

How to communicate with the GenServer then, since pid returned from start_link would be a pid of some supervisor?

You’d probably want to let processes register in a registry instead of communicating with processes by pids – both internally as well as from the users side. That will allow you to refer to them without knowing their specific pids (which might change e.g. due to restarts).

I get it. That way I couldn’t simply do

{:ok, pid} = MyGenServer.start_link()
GenServer.call(pid, :ping)
# or
MyGenServer.ping(pid)

which is sometimes useful too. I guess it’s a tradeoff?

Depending on the registry you can still use GenServer.call though instead of using a pid you’d use a via tuple: Registry — Elixir v1.13.4

1 Like

You can still achieve that, check out Module-based Supervisors from the documentation.

If you know that your GenServer will only ever run once, then you can also register a global name.

1 Like

I wonder why you don’t save your Tasks’ PID in a registry or ETS and terminate if your Genserver crash?

Like this:

def terminate(reason,  state) do
  if reason != :normal do
    # Do !! terminate all the task when your genserver crashes
  end
end

That requires creating the registry, creating the name, I guess it adds some overhead

{:ok, registry} = Registry.start_link(keys: :unique)
name = {:via, Registry, {registry, :my_gen_server}}
{:ok, _pid} = MyGenServer.start_link(name: name)
GenServer.call(name, :ping)
# or
MyGenServer.ping(name)

but, most importantly not being able to communicate via pid at all seems weird to me. I mean it would be justified when I had some logic for restarting helpers and the main GenServer and would like to ship it all together. But my only need is to be sure that all the linked processes are dead :wink:

I don’t deny that there’s overhead, but without venturing outside of OTP there’s no nesting workers underneigh workers, but only supervisors and workers at the leaf nodes. You could use e.g. terminate/2 to attempt to cleanup external processes, but it’s not perfect and you’ll still need to deal with timeouts and such.

Also while you talk about a single process here, this is not really the case. A GenServer with some spawned Tasks isn’t a single process anymore.

2 Likes

You’re kind of duplicating the purpose of Task.Supervisor and supervision trees. “No Task should be started unsupervised” is generally a good advice to follow.

As for the process communication, I think the easiest thing would be to require a name for the supervisor, but actually pass the name down and register the worker with that name. You’d then communicate with the process by name.

2 Likes

That’s interesting, but a bit confusing to me. If you pass a name to the supervisor, don’t you expect the supervisor to have this name, not some other process :thinking:? May you know some libraries that do it that way?

I wouldn’t use the name passed for the worker itself (it should go on the supervisor imo), but you can derive the names of all the started processes. Broadway does that for example.

3 Likes

I have a question regarding this as well, how about Task.async? Should Task.Supervisor.async be always used? Task.async seems quite common though, if not even encouraged :thinking:

From the docs:

(…) we recommend developers to always start tasks under a supervisor. This provides more visibility and allows you to control how those tasks are terminated when a node shuts down. That might look something like Task.Supervisor.start_child(MySupervisor, task_function).

I’d default to starting all tasks under a supervision tree for long-running applications. Task.async is fine for a quick async operation.

:+1:

2 Likes