Pretty much as the title says. Is there any reason to start a process unlinked to any other process, particularly if that process is long lived. i.e. When have you used GenServer.start
over GenServer.start_link
.
Shooting from the hip - when would you use UDP instead of TCP on a network? Sometimes, reliability isnāt necessary. Maybe logging or telemetry is optional but if it dies and refuses to startup, youād rather keep on processing than kill the whole service.
That is still not a valid reason because you can still put it under a supervisor and tell the supervisor to not care about failures at all.
So even if the job of a process is completely discardable, you should still start it under a supervision tree because it gives you visibility of your application structure and give you sane shutdown semantics (even if the semantics is kill them all).
That sounds as good as unsupervised
Except that within the OTP supervision framework it is easy to introspect and get information about. Iāve been of the opinion for over a decade that every-single-process should be supervised or linked somewhere.
I think itās more about keeping track of the processes and making sure they shut down when a part of the application is no longer needed. More so than anything related to reliability.
You might use UDP for performance but I donāt think not linking is going to have much benefit for performance
This is IMO the most important reason for start_link
. If you use plain start
, thereās always a chance youāll leave some dangling process behind, and that might cause various weird behaviour.
This is why a worker should sit under a supervisor even if you donāt want to restart it. Because supervisors are not just about restarting, but also about synchronized starting in the proper order, as well as proper termination and cleanup.
So, my conclusion from this thread is the following. There is not really any reason to start processes without a link.
That would be my conclusion as well
Related to this: What about a process that is only monitored?
start_link
ensures that a bidirectional link is created between the two processes, with the supervisor trapping exits as to be able to intelligently respond to an exiting child, instead of also just blowing up. The other way around of course we do not trap exits: when the supervisor crashes, the children immediately crash as well.
Using a monitor (the supervisor monitoring the child), we would lose this second behaviour. This means that we would lose the cleaning-up that would happen when a supervisor would disappear.
My intuition tells me that this situation is less desirable, but I think it would be good to think/discuss about this related possibility in more detail. Are there any cases in which uni-directional monitoring would be better than a bi-directional link?
IMO, these things serve different purposes, and therefore donāt exclude each other. As mentioned in this thread, start_link
is a prerequisite for proper termination of processes. I canāt think of any scenario where dangling process is desirable, so I think that every process should be start_link
-ed under some parent (which should IMO most often be a supervisor).
A monitor is useful if you care about process termination, but you donāt want your termination to affect the process.
For example, letās say that some message arrives over a websocket. We want to handle it and send the response back to the other side. If thereās an error, we need to report that as well. But if the websocket connection is closed, we donāt want to stop the handling.
To make this happen, we could handle the message in a separate process. Weād start this process somewhere else in the supervision tree, so the process is start_link
-ed to some parent (e.g. a :simple_one_for_one
supervisor). That process would then send a message back to the communication process when it has finished. To handle an error, we need to monitor that process, and handle :DOWN
message with the abnormal exit reason. Such setup decouples message handling from the communication process, but at the same time ensures that the communication process can detect a failure of the message handler.
A more generalized case of this is: āstart a job, report a success or failureā. This means we have two activities: the job itself, and monitoring of its lifecycle. I usually handle this by starting the job under a :simple_one_for_one
supervisor, and have the reporter process monitor the job. If you want to ensure that the termination of the reporter takes down all the associated jobs, you can bundle the reporter and the job supervisor under a common :one_for_all
supervisor.
I just ran into this, as a matter of fact, I changed from using start_link to start. Itās a one-off device discovery that uses UDP multicast, the process automatically shuts down after a couple seconds. Itās almost like using a port.
There are other reasons that processes use links which have nothing to do with its supervisor, if it has one. One typical case is when the process has allocated resources then typically the servers managing the resources will link to the process so if/when it dies they will be notified and can clean up after it. This definitely has nothing to do with the supervisor which should not be handling things like this.
One benefit of this is that when it does properly then I will never have to worry about cleaning up after a process. I can just let it die and other processes will detect this and automatically clean up after it.
I sort of see it as linking in 2 directions: vertically in a supervision tree; and horizontally between workers. And they have different purposes.
I agree, IF we are talking about long lived processes and not very short tasks. But I believe it is perfectly fine to spawn simple processes in the middle of an OTP app when those processes have a short life and are allowed to fail without consequences. Sometimes it is an overkill to create Tasks for them or create a simple_one_for_one supervisor with a transient child spec just to host these one-off processes. Thereās just no upside of supervising certain processes.
However, is there any disadvantage to supervising these short-lived processes?
Nope, well a ātouchā of initial spawn overhead, but really nope. I still supervise everything just for the OTP tree functionality like introspection and reporting and all such.
Iād like to know if thereās any advantage of it?
I believe there isnāt, but please let me know if Iām missing something important here.
A process thatās heavily working on a task is not able to reply to {:system, ā¦} messages, so I donāt see the point using Tasks or GenServers for this purpose either.
Whatās more, supervision can start working against you, since too frequent restarts can cause the supervisor to crash itself. TaskSupervisor is configured with :temporary children, so they get restarted when they fail. (What if your processes are allowed to crash?) The the starter process looses the pid of those restarted tasks, so I really donāt see the benefit of supervising short lived processes.
One more thing: the tiny little overhead of supervising the process can be huge when the process has a short life.
I think the main reason is that you want to find out when your āshort lived processā for some reason turns out to stay around much longer than you intended.
I also believe that processes that do not respond to {:system, ...}
-messages probably end up in some special section in the introspection tools, and at least many of the introspection functions work directly with the schedulers (so outside the processā own execution scope) to ensure that things like infinite loops will not hamper you from e.g. seeing how much memory the process is claiming.
As I mentioned here, the main advantage of a supervised process is that it sits in the process hierarchy, and therefore it can be properly terminated when any of its ancestor terminates.
In contrast, a vanilla spawned process will linger on, which may cause problems such as reentrancy and race conditions, for example.
Of course, if a process is āshortā (whatever that means, b/c in some cases even a single millisecond can be long), the chances of that happening are smaller. But they are still greater than zero, whereas with proper OTP supervision tree this canāt happen, at least not with default settings.
Moreover, the āshortnessā of a process is a tricky thing to guarantee. You need to be absolutely sure that no matter what kind of input is given, the process is going to finish āquicklyā. If thereās some bug which causes the process to run longer, the chances of dangling process increase. If a bug causes a process to hang indefinitely, the system might not be able to fix that automatically, and a human operator needs to fix the problem manually.
Therefore, I would never advise using plain spawns in production, or otherwise bypassing OTP hierarchy of processes. By doing this, youāre creating processes in a limbo, outside of any OTP app or supervision tree. In many cases it might work fine, but when it bites you, it can be nasty, and hard to understand. Iām speaking from my personal experience here
A middle-ground between a supervised process and a plain spawn could be to start a Task
directly with start_link
or async
. At least with that, the OTP hierarchy is preserved, and the risks are reduced. You can still mess things up, but the risk surface is smaller than with plain spawns.
Task.Supervisor
indeed uses :temporary
restart strategy by default. However, that means that tasks are not restarted when they fail, and this is usually what you want.
The overhead of asking a supervisor to start some child is in most cases insignificant. In 7 years of working with Erlang, Iāve never personally encountered the case where supervisor overhead was a problem.
That said, there is a known bottleneck if many processes frequently ask the same supervisor to start some child. Since all those requests are serialized in the same process, the supervisor can become overloaded, and that can cause problems. If thatās the case, sharding (using multiple supervisors) can usually help.
Hi sasa,
As I wrote a few comments above, I agree with supervising processes that are there for a long time. What I tried to say is that in my opinion really short processes are fine to go without supervisors. And when I write short lived processes, I really mean short lived ones.
A accept that supervision in this case can help reveal that a short process turnes into a long running one. You guys are right, this is a valid reason to supervise them.
What was in my mind when I wrote is a system I had to write 2 weeks ago. It consists of two processes: 1) a GenServer that is receiving network packets from an ethernet interface and 2) another process that is a GenEvent event manager. The purpose of the system is to generate events for other applications when certain packets appear on the LAN. I donāt want the GenServer to be a bottleneck, so I donāt parse the raw packets there. Instead, I spawn a process with this function:
def assimilate(raw_packet) do
{:packet, _link_type, time, _pkt_len, frame} = raw_packet
{:ok,
{[
{:ether, local_mac, _remote_mac, _, _},
{:ipv4, _, _, _, _, _, _, _, _, _, _, _, remote_ip, _local_ip, _},
{:tcp, _, _, _, _, _, _, _, _, _, 1, _, _, 1, _, _, _, _, _raw_tcp}
], _}
} = :pkt.decode(frame)
true = :ets.member Sniffer.HostList, remote_ip
timestamp = :calendar.now_to_universal_time time
GenEvent.notify Sniffer.Events, {timestamp, local_mac, remote_ip}
end
It uses and Erlang lib to parse the packet. I have to filter the SYN+ACK (those are the 1s in the pattern) TCP packets that where the ip address is on a predefined list. Because of the pattern matches this process crashes as soon as it find out that the packet is not of the specified type. The last line is only reached when everything is a-okay with it.
Do you really think that this simple code should run under a supervisor? Is it a real danger that this one does not stop for a long time, and the process remains alive? I really would like to know if you guys think that this solution is wrong or could be rewritten in a more stable, more effective or more elegant way.
I believe it depends on the exact situation if itās okay to have raw processes or itās better to use OTP. And yes, I accept that as a general rule itās better to always use supervisors.