Hi, let’s say I have a GenServer that I start using start_link under a supervisor. This GenServer has some helper processes that it also starts with start_link, but not under any supervisor, for example by just doing Task.start_link. How can I make sure, that when the GenServer crashes, all the helper processes also exit before the GenServer is restarted? Especially given that the helper processes may trap exits and spawn their own helpers, thus it may take some time until they exit.
Another question is, should I make sure? Having this guarantee I could avoid some problems when operating on global resources. For example, by starting the GenServer in tests with ExUnit.start_supervised I can be sure it exits before a subsequent test starts, making the tests isolated. But when a helper process doesn’t exit fast enough, they aren’t really isolated.
Or, consider a case when a helper process listens on some port. If it’s started again before being shut down, the port will be still occupied.
On the other hand, these issues can be solved within the helper processes themselves.
Long story short: with the way you currently have it setup you very likely cannot, or at least not easily without replicating parts of what a Supervisor does.
Your best bet is probably to instead start a Supervisor of it’s own which starts the GenServer in question and any other helper processes - such as a Task.Supervisor used by the GenServer - and a restart strategy of :one_for_all or :rest_for_all depending on your semantics.
Yeah, I thought of it, it would be ok if the GenServer was used in my app only, but since it’s not, that would require anyone who uses it to build that supervision tree. Or I could wrap it somehow, but I have no idea how. I mean, till now everyone who uses my GenServer just puts it under a supervisor and it works. In tests, they do start_supervised and it works. Now, when I delegate some work to a separate task, I’d like it to work the same way, but I don’t see how, even if I replicated some supervisor behaviour. Is that approach somehow extraordinary / not in line with the OTP way?
To those people it should be mostly transparent if the child_spec of the module starts a single process or a whole tree of them. E.g. the Ecto.Repo most people have in their apps spawns a supervisor on top with about a handful or processes beneigh.
You’d probably want to let processes register in a registry instead of communicating with processes by pids – both internally as well as from the users side. That will allow you to refer to them without knowing their specific pids (which might change e.g. due to restarts).
but, most importantly not being able to communicate via pid at all seems weird to me. I mean it would be justified when I had some logic for restarting helpers and the main GenServer and would like to ship it all together. But my only need is to be sure that all the linked processes are dead
I don’t deny that there’s overhead, but without venturing outside of OTP there’s no nesting workers underneigh workers, but only supervisors and workers at the leaf nodes. You could use e.g. terminate/2 to attempt to cleanup external processes, but it’s not perfect and you’ll still need to deal with timeouts and such.
Also while you talk about a single process here, this is not really the case. A GenServer with some spawned Tasks isn’t a single process anymore.
You’re kind of duplicating the purpose of Task.Supervisor and supervision trees. “No Task should be started unsupervised” is generally a good advice to follow.
As for the process communication, I think the easiest thing would be to require a name for the supervisor, but actually pass the name down and register the worker with that name. You’d then communicate with the process by name.
That’s interesting, but a bit confusing to me. If you pass a name to the supervisor, don’t you expect the supervisor to have this name, not some other process ? May you know some libraries that do it that way?
I wouldn’t use the name passed for the worker itself (it should go on the supervisor imo), but you can derive the names of all the started processes. Broadway does that for example.
I have a question regarding this as well, how about Task.async? Should Task.Supervisor.async be always used? Task.async seems quite common though, if not even encouraged
(…) we recommend developers to always start tasks under a supervisor. This provides more visibility and allows you to control how those tasks are terminated when a node shuts down. That might look something like Task.Supervisor.start_child(MySupervisor, task_function).
I’d default to starting all tasks under a supervision tree for long-running applications. Task.async is fine for a quick async operation.