Servers Exiting vs Terminating in Programming Elixir Book

I have a question regarding server termination vs server exit.

I would appreciate that this question be considered in light of the below code and excerpt.

defmodule Duper.Worker do
    use GenServer, restart: :transient

    def start_link(_) do
        GenServer.start_link(__MODULE__, :no_args)
    end

    def init(:no_args) do
        Process.send_after(self(), :do_one_file, 0)
        { :ok, nil }
    end

    def handle_info(:do_one_file, _) do
        Duper.PathFinder.next_path()
        |> add_result()
    end

    defp add_result(nil) do
        Duper.Gatherer.done()
        {:stop, :normal, nil}
    end

    defp add_result(path) do
        Duper.Gatherer.result(path, hash_of_file_at(path))
        send(self(), :do_one_file)
        { :noreply, nil }
    end

    defp hash_of_file_at(path) do
        File.stream!(path, [], 1024*1024)
        |> Enum.reduce(
        :crypto.hash_init(:md5),
        fn (block, hash) ->
            :crypto.hash_update(hash, block)
        end)
        |> :crypto.hash_final()
    end

end

Why Can’t We Just Write a Looping Function?

We can implement a loop in Elixir using a recursive function call.
But the worker server doesn’t do this. Instead, it sends itself a
message and then exits after processing each file.

The reason is that the Elixir runtime won’t let any one invocation of
a server hog the CPU forever. Instead it sets a timeout on each call
or cast into a GenServer (by default 5 seconds). If the call or cast
handler has not returned in that time, the runtime assumes
something has gone wrong and terminates the server.
Processing a million files in a loop will take more than 5 seconds.
So we instead just process one file per entry into the server, and then
queue up another message to process the next on a fresh entry. The
result: no timeouts.

My question is regarding the bolded and italicised statement. My understanding was that a Gen server exited by returning :stop, :terminate, or calling System.halt(num), however the explanation suggests that the server exits implicitly, so I wonder if exiting and terminating are different.

Also just prior to when server is said to exit, it sends itself the :do_one_file atom, but the init of the server already does this, so the server can not be terminating and reinitialising since the sending of such a message when would not be necessary at termination as it also takes place at initialisation.

There definitely seems to be a difference between exiting and termianting, I would appreciate some clarity on the difference(s).

Instead, it sends itself a message and then exits after processing each file.

The description is wrong. That server does not exit after processing each file, but only when Duper.PathFinder.next_path() returns nil, I assume when there is no more file.

By returning a stop tuple, the implementation tells GenServer to terminate the current server, which will in the end either call exit/1 or just stop looping forever awaiting messages (I don’t remember how it is done in gen_server). (spawn(fn -> :ok end) and spawn(fn -> :ok; exit(:normal) end) are equivalent.)

but the init of the server already does this, so the server can not be terminating and reinitialising since the sending of such a message when not be necessary at termination when it is also takes place at initialisation.

Rember that if the server restarts, it is a new process, so sending to self will not forward the messages to the new instance.

1 Like

The use of “exits” here is confusing - I believe the author means “returns” (as in “returns to the GenServer loop that called handle_whatever”) because as you’ve noted “exit” has a very specific meaning.

The “or cast” of this sentence is flatly wrong; a cast can’t timeout - you can even cast to processes that aren’t alive without getting an error.

I’m also unsure what this has to do with anything, since the posted code doesn’t define a handle_call. There’s a different timeout from the supervisor inside start_link - IIRC it will eventually fail if init takes too long to return - but that’s not really the same thing.

1 Like

I see, so what actually is happening to prevent elixir from timing out the server?

This server is being managed by a dynamic supervisor. I don’t know if this changes anything with respect to the veracity of the excerpt, but I wanted to accentuate this just in case.

Nothing. There is no timeout related code here. Even send_after uses zero timeout (why not use send instead ?).

It was stated elsewhere that this is used to prevent lost messages as a consequence of initialisation not finishing in time for their receipt, but I am not clear on its need in this instance, outside of an init function.

Edit: send_after is not used outside of init, that was a misread on my part.

Edit:

Ah that’s interesting, It’s stated that send_after uses 0 timeout because the message itself will not be received until after the initialisation process has concluded, so it will still not kick off servers until after initialisation, but what would happen with send?

I imagine it would be the same, since the only delaying factor is readiness to receive the message.

I guess send_after must mean send to be received after the block (or process?), there seems be be more internally going on than the explanation elucidates, for me at least.

Where does that material comes from? It seems that the author does not fully understand the very basics of generic servers. Which is quite surprising if the book is “Programming Elixir” by Dave Thomas. Is it some other book?

In your case, send or send_after makes no difference. The generic server init callback is called, and once it is finished the gen_server module will start to pull messages from the process mailbox and dispatch them to handle_info (or handle_call/handle_cast).

During the execution of init, the process can receive many messages from other processes, if those processes know the pid of your process (likely not, your process would have to tell them first in init) or use the registered name of your process. When calling send(self(), message) during init, you just add the message to the mailbox of the process, but that message will only be read after init is complete. So using send_after is slightly different but achieves the same thing.

(Note that a Task would be more appropriate with the current set of features of your code, as the server does not actually handle other types of messages that could be interleaved with the processing of files).

I imagine it would be the same, since the only delaying factor is readiness to receive the message.

So yes, the readiness is just “when init is done, and returned {:ok, ...”

I guess send_after must mean send to be received after the block (or process?), there seems be be more internally going on than the explanation elucidates, for me at least.

If you’d use send_after with 5000 for instance, that gives time to other processes to send you messages. But once the loop would be started, as add_result uses send, it would have the same behaviour. Only the start of the loop would be delayed by five seconds.

1 Like

When the workers are intialised they are capable of calling

defp add_result(nil) do
    Duper.Gatherer.done()
    {:stop, :normal, nil}
end

Duper.Gatherer.done() accesses the server by predefined name set in a start_link func, so this would be why the Duper.Gatherer module would need send_after/send in its init.

The Duper.Worker server is calling a Duper.PathFinder server function, which will send a message back, so it too uses send_after to prevent the intialising-server interacting with anything that uses it until it has finished intialising.

The documentation I read does not outline why setting a timeout of 0 will not cause the functionality of send_after(self(), :message, 0) to be the same as send(self(), :message), so that’s another area where I am a bit confused.

This was a (very important) digression, as the thread was not initially concerning the preference of send_after vs send here. Still, I would appreciate clarity on what is going on with elixir’s default timeout behaviour and whether looping would have caused to take effect, such that it is instead preferred to send messages to self() rather than looping.

I think the above is the answer; the text from the provided excerpt states…

the Elixir runtime won’t let any one invocation of
a server hog the CPU forever

Perhaps invocation is initialisation?

So the send_after allows for the initialisation (invocation?) to complete, and subsequently the handle_info override can kick off the recursion. The override itself does not use looping, perhaps for decoupling purpose, as the only external server with which it interacts is the Duper.PathFinder server.

This is just my attempt it making sense of the architecture, emphasis is made on decoupling earlier on in the chapter. Limiting invocation time might be rephrased a “limiting initiation time”?

As I have said:

So the send_after allows for the initialisation (invocation?) to complete

Not really. If in init you call a function that does a loop over files, your init function would not complete until that is done too. But it will complete when there are no more files. As said by @al2o3cr , “it will eventually fail if init takes too long to return - but that’s not really the same thing”.

If the call or cast handler has not returned in that time, the runtime assumes something has gone wrong and terminates the server. Processing a million files in a loop will take more than 5 seconds. So we instead just process one file per entry into the server, and then queue up another message to process the next on a fresh entry. The result: no timeouts.

The timeout in question here is the default timeout of 5 seconds of GenServer.call. The problem with this example is that is does not demonstrate how a call would timeout, because there is no GenServer.call involved.

1 Like

I understand.

I’m finishing with this chapter today after going through it a few times. Thanks a lot the clarifications. I think it was just a small mix-up in the book.

The server was not meant to have calls anyway due to the way it functioned, but I believe at one point having it receive messages from another service was on the cards, so even though the timeout the author wanted to address was for init, somehow the explanation was given as though it was for GS calls.

I went ahead and reset my original solution selection as it addressed the use of the word “exit” in the excerpt, which was a really point of confusion.

God bless.