smaximov

Supervised GenServer process disappeared and wasn't restarted

Greetings!

Tonight we had an incident in production when a somewhat important GenServer process seemingly disappeared and had not been restarted by a supervisor. This is the first incident in 2.5 years since this worker had been introduced.

The GenServer in question has basically the following structure:

defmodule MyWorker do
  use GenServer, restart: :transient

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  @impl GenServer
  def init(opts) do
    Process.send_after(self(), :do_work, 500)

    {:ok, opts}
  end

  @impl GenServer
  def handle_info(:do_work, state) do
    do_some_work(state)
    Process.send_after(self(), :do_work, 500)

    {:noreply, state}
  end

  defp do_some_work(state) do
     # query DB
     # send HTTP request
     # that's all - no OTP messaging, exits, etc.
  end
end

This GenServer is started as part of a Supervisor (let’s call it MySupervisor). MySupervisor uses the :one_for_one strategy, all other init options are default.

When I connected to the remote console, I found out that the process is not alive - Process.whereis(MyWorker) returned nil, Supervisor.which_children(MySupervisor) returned the following entry for MyWorker:

{MyWorker, :undefined, :worker, [MyWorker]}

As far as I understand, this means MyWorker’s child spec was known to MySupervisor, but the worker process itself was not running. I restarted MyWorker with Supervisor.restart_child/2 and started figuring out what went wrong.

The obvious culprit is restart: :transient - as MyWorker was intended to work indefinitely and to never successfully terminate, it should have been :permanent. If we assume MyWorker terminated normally, it explains why it wasn’t restarted.

The problem is figuring out why MyWorker died, because it neither stops itself, nor it is explicitly stopped by some other app code. I have 2 hypotheses left:

some developer connected to the remote console and stopped the worker by hand (very unlikely);
the worker terminated with an error and was not restarted due to a bug in Supervisor (extremely unlikely).

Am I missing anything else?

10 comments

#supervisor #gen_server

25 311 10

2025-10-27 08:52:24 UTC

Most Liked

mudasobwa

Creator of Cure

There are two things to take into consideration as well:

① Restart strategies, specifically :max_restarts - the maximum number of restarts allowed in a time frame. Defaults to 3 and :max_seconds - the time frame in which :max_restarts applies. Defaults to 5. If something went south and there were 4 attempts to restart it within 5 seconds interval (defaults,) all failing, the GenServer would end up not restarted. This is likely your case, if e. g. the DB connection has dropped for 3 seconds. The culprit would be 500ms interval in init/1, which makes 4 consequtive insuccessful attempts to execute do_some_work/1 to fit into 5 secs and shutdown the process forever.

② The mailbox overflow. This technically might be your case, but extremely unlikely, just saying for the complete picture.

Post #4

LostKobrakai

The app will terminate if the root pid exits, and immediatelly on the first exit, no restarts. And if the app is :permanent it’ll make the whole vm stop.

Post #7

lud

Yes the supervisor will exit if it has to restart a child too many times. And it will be restarted by its own supervisor.

You can have a chain of Sup → Sup → Sup → Sup → Worker and all elements of the chain will behave like this.

A good way to think about this is that the code describing the supervisor tree (each supervisor children, and the children description in each child supervisor, and so on) describes “how the app state should be at runtime”. And the system will try its best to maintain that state, or otherwise fail and shutdown totally.

Of course that description can be altered at runtime by calling start_child, delete_child to add/remove parts of the tree, and of course having :transient, :temporary or :significant children.

But there is no way it will skip some parts of the tree unless you tell it to. Max-restarts and al’ will only have a temporary impact on the state of the system, but the app will always get back to the desired state.

So if my understanding is correct, if MyWorker was linked to another process, and that process exited with a :shutdown (or {:shutdown, value}), then it would indeed cause MyWorker to terminate normally as well. Unfortunately, MyWorker isn’t linked to any other process.

If the child spec for MyWorker has restart: :transient then yes! (:shutdown is not “normally” but the supervisor will not restart it indeed).

Any progress on finding the cause?

Another possibility is that the child spec is overriden somewhere and :transient is replaced with :temporary ; but I guess this is unlikely.
Are there any calls to Supervisor.terminate_child or GenServer.stop in your codebase?
Are you using :significant with :any_significant or :all_significant?

Post #9

Where Next?

View thread on forum (has 10 responses!)

supervisor

gen_server

Home Questions & Help>Questions

#supervisor #gen_server

25 319 10

Last post

Popular in Questions

Questions & Help>Questions

How to get current request url full query params (as a keyword list if possible) inside a phoenix template?

For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&query=perfume&page=2, I would like to get: ...

/phoenix #url #templates #params

7 15285 4

2019-05-07 14:47:00 UTC

New

Questions & Help>Questions

lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "equa...

/phoenix

10 20190 6

2017-05-05 14:47:06 UTC

New

Questions & Help>Questions

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

I am using the Starship cross-shell prompt – it seems pretty nice, but I get some errors: [WARN] - (starship::utils): Executing command ...

#starship

8 17307 3

2021-04-26 16:14:19 UTC

New

Questions & Help>Questions

How To Get Phoenix & VueJS working Together?

I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...

/phoenix

93 22614 42

2019-12-19 09:28:07 UTC

New

Blogs & Podcasts>Blog Posts

Elixir Blog Posts

Update: How to use the Blogs & Podcasts section You can post links to your blog posts or podcasts either in one of the Official Blog...

hexdocs.pm

#blog-posts #wiki #stickies #official-blog-posts-thread

3271 126479 1222

2025-10-04 00:32:54 UTC

New

Chat & Discussions>Discussions

LiveView demos, examples, and sample apps thread!

Seen any cool LiveView demos, sample apps or examples? Please post them here! :003:

/phoenix #liveview

232 30566 60

2021-07-02 10:53:43 UTC

New

Chat & Discussions>Discussions

The complexity of Haskell vs. Elixir's simplicity

I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...

#language-implementation #haskell

138 29991 35

2021-03-12 09:32:38 UTC

New

Questions & Help>Questions

How To Implement if...else if...else condition

Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...

#how-to-question

40 52286 6

2017-08-23 10:29:43 UTC

New

Questions & Help>Questions

Why would I choose Elixir as a general purpose programming language?

In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...

#functional-programming #use-cases

65 34961 13

2020-01-05 04:29:20 UTC

New

Questions & Help>Questions

Help with elixir-ts-mode in doom-emacs config

Questions & Help>Questions

Are Vi keybindings possible inside IEx?

Questions & Help>Questions

I miss the ternary operator - does anyone have a macro that allows a ternary operator in Elixir code?

Questions & Help>Questions

Empty Result on Generic Action with graphql_unnested_unions

Questions & Help>Questions

Clarification about `assign/2,3` usage in `render/1` callbacks

Questions & Help>Questions

With the new 1.20 release does it change the way you see Gleam?

Questions & Help>Questions

Using Phoenix.LiveView.TagEngine as an EEx.Engine is deprecated!

Questions & Help>Questions

About ambiguity introduced in function default arguments

Questions & Help>Questions

OpenApiSpex schema - are there any naming conventions on handling show and index routes?

Questions & Help>Questions

How to get type warnings before test failure reports

Questions & Help>Questions

Questions Questions ❯

Latest on Elixir Forum

Workflow: downstream dependency on a graft is dropped when the grafting workflow is itself grafted (nested graft)

Questions & Help>Troubleshooting

Biomine - Javascript and Css formatter using biome

News>Announcing

bluez - bluez over d-bus library

News>Announcing

Potions - deploy and manage Phoenix apps on your own VPS

News>Announcing

Senior Full Stack Engineer (Elixir, React) - Rabbet, Austin, Remote USA (TX, CO, NC preferred)

Jobs & Member Profiles>Jobs

2026/09/09 - Building Local-First Apps in Pure Elixir with Hologram (ElixirConf US training) - Chicago, USA

Events/Confs/Meet Ups>List

Let libraries be libraries

Blogs & Podcasts>Blog Posts

Nature_whistle v0.3.0 is out - telemetry driven alerting with recovery notifications

News>News & Updates

What do we do with logging in libraries?

Blogs & Podcasts>Blog Posts

Software Engineer - Soluna, Remote USA

Jobs & Member Profiles>Jobs

Getting tsvectors: error] ** (Postgrex.Error) ERROR 42703 (undefined_column) record “new” has no field “business_id”

Questions & Help>Troubleshooting

Keynote: DurableServer: Always Running Somewhere - Chris McCord | ElixirConf EU

Learning Resources>Talks

finance - XIRR, NPV and other financial calcs matching Excel/Sheets

News>Announcing

Oaskit 0.14.1 - security release

News>News & Updates

API Management Console - runtime route toggling for Phoenix apps

News>Announcing

Elixir Forum ❯

Sub Categories:

Forums

We're in Beta

About us Mission Statement

Supervised GenServer process disappeared and wasn't restarted

smaximov

Supervised GenServer process disappeared and wasn't restarted

Most Liked

mudasobwa

LostKobrakai

lud

Where Next?

Popular in Questions

How to get current request url full query params (as a keyword list if possible) inside a phoenix template?

How to get the call stack/stack trace at any point in code?

Where / How does the Mix environment variable get set?

How to decode a JSON into a struct safely?

What do you think of Gleam compared to Elixir?

Regex question for hyphen match

Updating structs: Map.put vs %Foo{oldfoo | new: value} vs put_in

Checking if an enum is empty - Credo vs Compiler

IEX in Windows Powershell?

How To Implement if...else if...else condition

Other popular topics

(Postgrex.Error) FATAL 28P01 (invalid_password) password authentication failed for user “postgres”

How do I use the Postgres JSONB / Postgrex JSON extension?

How to use return statement with if condition in elixir?

Starship (cross-shell prompt) error - (starship::utils): Executing command "elixir" timed out

How To Get Phoenix & VueJS working Together?

Elixir Blog Posts

LiveView demos, examples, and sample apps thread!

The complexity of Haskell vs. Elixir's simplicity

How To Implement if...else if...else condition

Why would I choose Elixir as a general purpose programming language?

Questions & Help>Questions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Popular Tags

Our Sponsors

We're in Beta