semmitmondo

semmitmondo

Supervision problem?

Hi,

I just found a very strange and mysterious error here. I initially thought it was me doing something the wrong way, but I managed to strip it down to a bare, out of the box Phoenix app. Here’s how to reproduce it:

mix phx.hew test --no-ecto
cd test/
iex -S mix

Then start Observer and kill the TestWeb.Endpoint process. What you will see is the whole application is exiting. The reason is that the top level supervisor cannot restart the Endpoint. Well, that’s not quite true, because if you trace the top level supervisor, it seems that the restart succeeded, still the supervisor thinks it failed:

 iex> :sys.trace Process.whereis(Test.Supervisor), true

Here’s the full output.

*DBG* 'Elixir.Test.Supervisor' got {'EXIT',<0.235.0>,killed}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok, 
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got {'EXIT',<0.336.0>,
                                    {shutdown,
                                     {failed_to_start_child,
                                      'Elixir.Phoenix.PubSub.PG2',
                                      {already_started,<0.237.0>}}}}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok, 
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got cast {try_again_restart,
                                         'Elixir.TestWeb.Endpoint'}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567,-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok,
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got {'EXIT',<0.337.0>,
                                    {shutdown,
                                     {failed_to_start_child,
                                      'Elixir.Phoenix.PubSub.PG2',
                                      {already_started,<0.237.0>}}}}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567,-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok,
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got cast {try_again_restart,
                                         'Elixir.TestWeb.Endpoint'}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567,-576460567,-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok,
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got {'EXIT',<0.338.0>,
                                    {shutdown,
                                     {failed_to_start_child,
                                      'Elixir.Phoenix.PubSub.PG2',
                                      {already_started,<0.237.0>}}}}
*DBG* 'Elixir.Test.Supervisor' new state {state,
                                          {local,'Elixir.Test.Supervisor'},
                                          one_for_one,
                                          {['Elixir.TestWeb.Endpoint'],
                                           #{'Elixir.TestWeb.Endpoint' =>
                                              {child,
                                               {restarting,<0.235.0>},
                                               'Elixir.TestWeb.Endpoint',
                                               {'Elixir.TestWeb.Endpoint',
                                                start_link,[]},
                                               permanent,infinity,supervisor,
                                               ['Elixir.TestWeb.Endpoint']}}},
                                          undefined,3,5, 
                                          [-576460567,-576460567,-576460567],
                                          0,'Elixir.Supervisor.Default',
                                          {ok,
                                           {#{intensity => 3,period => 5,
                                              strategy => one_for_one},
                                            [{'Elixir.TestWeb.Endpoint',
                                              {'Elixir.TestWeb.Endpoint',
                                               start_link,[]},
                                              permanent,infinity,supervisor,
                                              ['Elixir.TestWeb.Endpoint']}]}}}
*DBG* 'Elixir.Test.Supervisor' got cast {try_again_restart,
                                         'Elixir.TestWeb.Endpoint'}
[info] Application test exited: shutdown

Seems like Phoenix.PubSub.PG2 is causing the problem. It cannot be started, because it is already running. What’s wrong here? I believe the app should have survived a process crash like the killing I do here. Or am I wong?

Phoenix.PubSub.PG2 is supervised by the Endpoint, and for some reason it just does not appear in observer under the Endpoint in the supervision tree.

The Phoenix version I use is 1.7.4.
Tried with OTP 21.2.1 and 21.2.2, Elixir 1.7.4 and also with 1.8.0-rc.1. The same happened.

Most Liked Responses

josevalim

josevalim

Creator of Elixir

This is more of an issue of using Process.exit(pid, :kill) on a supervisor. :kill is a last resort message, which causes the supervisor to immediately terminate, without notifying its children to terminate properly. So the supervisor exits, the children now have to notice their parent is dead, clean up, and terminate.

Meanwhile, the application is restarting part of the tree at the same time, which ends up conflicting with the old one still shutting down, causing another failure. This eventually triggers max_restarts and the application shuts down.

Overall:

  1. Just use :kill as a last resort, when the process did not respond to any other exit signal. This especially applies to supervisors, as their only job is to ensure processes start and terminate accordingly, and sending a :kill voids that

  2. If you want to simulate stopping a supervisor, Supervisor.stop will at least go through the usual flow

  3. Run iex --logger-sasl-reports true -S mix phx.server to get precise logging from supervisors


One question that may arise from this is: could my supervisor fail in a way that triggers the same behaviour as the :kill signal? Supervisors trap exits, which means no other process (linked or not) could cause them to crash. Therefore, this can only happen if there is a bug in the supervisor. And if there is a bug in the supervisor, then indeed they can’t guarantee their fault tolerant properties anyway. That’s why supervisors rarely change, they have been strongly tested for decades and are an essential piece of your application.

11
Post #4

Where Next?

Popular in Questions Top

sen
Hi All, I set a environment variables in dev.exs , like below code. when i start server, how can i set the ${enable} value? thanks. d...
New
Kurisu
For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&amp;query=perfume&amp;page=2, I would like to get: ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID&lt;0.412.0&gt; terminating ** (Postgrex.Error) FATAL...
New
tduccuong
Hi, is there any work on GUI with Elixir, that is similar to Electron/Javascript? My idea is to bundle Phoenix and BEAM into a single se...
New
nobody
How to bind a phoenix app to a specific ip address? could not find anything about that, nowhere, unfortunately, but for me this is quite...
New
vegabook
I’m brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
vonH
When I run the Plug and I recompile I wind up having to use Ctrl C to quit iex and start again. Witht the help of rlwrap I can use the cu...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
script
If I have a string “1000 cfu/ml” . I want to remove the characters and / and space . So the string is like this "1000" What is the ...
New
ashish173
I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...
New

Other popular topics Top

aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
electic
Hi, I am new to Elixir. I am trying to use the DateTime component to insert a date into MySQL however the there seems to be no way to fo...
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
josevalim
Hi everyone, One of the features added to Elixir early on to help integration with Erlang code was the idea of overridable function defi...
New
aesmail
Hello guys, I have finally made it. I created an admin interface for a framework. It’s been on my todo list for years and with the curre...
New
vonH
When I run the Plug and I recompile I wind up having to use Ctrl C to quit iex and start again. Witht the help of rlwrap I can use the cu...
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
hariharasudhan94
I would like to know what is the best IDE for elixir development?
New
dogweather
I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New

We're in Beta

About us Mission Statement