Getting more info on [info] Application xyz exited: shutdown

Following Dave Thomas’s course, I have an Application that starts a supervised Agent that is hardcoded to crash 1/3 times.

After crashing multiple times by running Dictionary.random_word, for some reason the Application itself exits.

Why does the Application crash as well?

mix.exs
lib/
  dictionary/
    application.ex
    word_list.ex
  dictionary.ex

application.ex

defmodule Dictionary.Application do

  use Application

  def start(_type, _args) do

    children = [
      Dictionary.WordList
    ]

    options = [
      name: Dictionary.Supervisor,
      strategy: :one_for_one
    ]

    Supervisor.start_link(children, options)
  end
end

word_list.ex

defmodule Dictionary.WordList do

  use Agent

  @me __MODULE__

  def start_link(_opts) do
    Agent.start_link(&word_list/0, name: @me)
  end

  def random_word() do
    if :rand.uniform < 0.33 do
      Agent.get(@me, fn _ -> exit(:boom) end)
    end

    Agent.get(@me, &Enum.random/1)
  end

  def word_list do
    "../../assets/words.txt"
    |> Path.expand(__DIR__)
    |> File.read!()
    |> String.split(~r/\n/)
  end
end

dictionary.ex

defmodule Dictionary do
  alias Dictionary.WordList

  defdelegate random_word(), to: WordList

end

mix.exs

defmodule Dictionary.MixProject do
  use Mix.Project

  def project do
    [
      app: :dictionary,
      version: "0.1.0",
      elixir: "~> 1.11",
      start_permanent: Mix.env() == :prod,
      deps: deps()
    ]
  end

  def application do
    [
      mod: { Dictionary.Application, [] },
      extra_applications: [:logger]
    ]
  end

  defp deps do
    []
  end
end

log

iex(15)> Dictionary.random_word
** (exit) exited in: GenServer.call(Dictionary.WordList, {:get, #Function<0.122627474/1 in Dictionary.WordList.random_word/0>}, 5000)
    ** (EXIT) :boom
    (elixir 1.11.3) lib/gen_server.ex:1027: GenServer.call/3
    (dictionary 0.1.0) lib/dictionary/word_list.ex:13: Dictionary.WordList.random_word/0

20:34:02.920 [error] GenServer Dictionary.WordList terminating
** (stop) :boom
    (dictionary 0.1.0) lib/dictionary/word_list.ex:13: anonymous fn/1 in Dictionary.WordList.random_word/0
    (elixir 1.11.3) lib/agent/server.ex:12: Agent.Server.handle_call/3
    (stdlib 3.14) gen_server.erl:715: :gen_server.try_handle_call/4
    (stdlib 3.14) gen_server.erl:744: :gen_server.handle_msg/6
    (stdlib 3.14) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.142.0>): {:get, #Function<0.122627474/1 in Dictionary.WordList.random_word/0>}
State: ["that", "this", "with", "from", "your", "have", "more", "will", "home", "about", "page", "search", "free", "other", "information", "time", "they", "site", "what", "which", "their", "news", "there", "only", "when", "contact", "here", "business", "also", "help", "view", "online", "first", "been", "would", "were", "services", "some", "these", "click", "like", "service", "than", "find", "price", "date", "back", "people", "list", "name", ...]
Client #PID<0.142.0> is alive

    (stdlib 3.14) gen.erl:208: :gen.do_call/4
    (elixir 1.11.3) lib/gen_server.ex:1024: GenServer.call/3
    (dictionary 0.1.0) lib/dictionary/word_list.ex:13: Dictionary.WordList.random_word/0
    (stdlib 3.14) erl_eval.erl:680: :erl_eval.do_apply/6
    (elixir 1.11.3) src/elixir.erl:280: :elixir.recur_eval/3
    (elixir 1.11.3) src/elixir.erl:265: :elixir.eval_forms/3
    (iex 1.11.3) lib/iex/evaluator.ex:261: IEx.Evaluator.handle_eval/5
    (iex 1.11.3) lib/iex/evaluator.ex:242: IEx.Evaluator.do_eval/3
iex(15)>
20:34:02.922 [info]  Application dictionary exited: shutdown

Dictionary.Application starts a Supervisor, which will restart its children UNTIL more than max_restarts (default 3) happen in max_seconds (default 5) - when that happens, the Supervisor will exit with :shutdown. See the Supervisor docs for more details.

4 Likes

Fantastic.

I suppose the Supervisor then kills the Application with it because it was itself started within Application using a Supervisor.start_link() call ?

Also any way to log the reason of crash for the Application? I would have most likely never found this alone

When a process or a supervisor eventually crashes, it propagates the “crash” upwards, all the way to the root of your app. You can change policies alongside your supervisors to capture and modify this behaviour.

2 Likes

I’m definitely missing something then. Isn’t the point of OTP, Supervision trees and Applications, the ability to tolerate fault? As in if a supervised process dies, the supervisor restarts it, and if a supervisor dies, the Application restarts it ?

Also how would you have approached identifying the root cause of the Application crash in this instance?

“An application” is a configuration file (the .app file generated by Mix) and a callback module. The process with the best claim on being “the application” is the Supervisor started in the application’s start callback.

OTP will log an error report when a supervisor shuts down from this, but IIRC it’s at :info.

2 Likes

Yes, and it does tolerate several. That count of faults (and other parameters) is configurable.

3 Likes

Trying to wrap my head around this… so what are processes 177 and 178 below?

Screen Shot 2021-01-23 at 4.11.43 PM Screen Shot 2021-01-23 at 4.11.27 PM

Would you be kind enough to please provide some code along with where it goes to inspect the error?
I tried looking for relevant callback in Application module but closest I got was prep_stop/1 which only seems to address Application.stop/1.

Is this how it works in general: if the top-level Supervisor crashes, the whole Application crashes ?

:application_master is the application in charge of handling the livecycle of other applications, like e.g. starting them. That does not provide any restarting however. If your (permanent) application crashes the system is expected to be in a non recoverable state and therefore the whole node stops. It can only be restarted from the outside using e.g. an os level supervisor like systemd or erlangs heart.

The restarting capabilities within the beam are reserved to processes – as opposed to applications – and handled only by supervisors. Therefore if you expect failures in your system you should try to isolate them from your applications root process as best as possible.

3 Likes