What is the difference between "spawn_link" and "start_link"?

tqtrung · April 8, 2019, 12:30pm

Hi everyone. I’m confused between “spawn_link” and “start_link”. Can you talk about their difference? Thanks

NobbZ · April 8, 2019, 1:20pm

spawn_link creates a linked process right where you are, start_link is a conventional name for a function that will eventually create a linked process whereas the process lifecycle is managed for you.

tqtrung · April 9, 2019, 3:01am

Hi Nobbz. Thank you for replying. Does process which is created by “start_link” be linked process right where you are such as “spawn_link”?

kokolegorille · April 9, 2019, 3:37am

Yes it does too.

NobbZ · April 9, 2019, 7:05am

Hi @tqtrung, I’m not sue if I understand you correctly, but the convention is to link to the process which calls the start_link. This is not only due to its easy to understand, but also necessary that it works fine in a supervision tree.

sasajuric · April 9, 2019, 7:22am

Compared to spawn_link BIF, the main difference is that start_link is most often synchronous, which means that the function returns after the spawned process acknowledges that it has been initialized. In contrast, with spawn and spawn_link you don’t have any guarantees about synchronism. These functions might return before the new process has executed a single instruction.

To illustrate the difference, let’s look at the following snippet:

foo_pid = spawn_link(fn -> do_something() end)
bar_pid = spawn_link(fn -> do_something(foo_pid) end)

With this code, bar can’t assume anything about the progress of foo. It’s possible that e.g. 1000 instructions are executed in bar, while foo hasn’t executed the single instruction.

In contrast, if start_link is used (typically via GenServer or other higher-level wrappers), we’d have something like:

{:ok, foo_pid} = Foo.start_link()
{:ok, bar_pid} = Bar.start_link(foo_pid)

So here, the bar process is started only after foo has been initialized. This means that you can reliably reason about the order of execution. You can be certain that foo prepared the necessary initialization steps before moving forward with starting bar.

It’s also worth noting that start_link can return a synchronous error (the process decided to stop during its initialization). This is why start_link has the {:ok, pid} | {:error, start_error} format, while spawn always succeeds.

In addition, start_link usually involves a bit of extra bookkeeping. Typically, a start_link-ed process will keep track of its parent (starter process), and it will also store the initial MFA which was used to start the process. The former is used to implement a process hierarchy, and ensure proper process cleanup (to take down all descendants before terminating a process). The latter is mostly (if not completely) used for logging/debugging purposes.

It’s worth noting that spawn_link is a built-in function (aka BIF) implemented in C. In other words, it’s a basic feature of the runtime layer.

In contrast, various incarnations of start_link are implemented in Erlang, and you could make a custom version of such function yourself (which is IMO a nice exercise).

For more details, I suggest studying the code of the proc_lib module. Among other thing, this module contains the implementation of start_link which is used by most other start_link implementations, such as GenServer.