DynamicSupervisor restart on children crash

dynamicsupervisor
#1

Hi everyone !

Coding a multi scraper dynamically supervised, I encountered an issue really similar to

Use case:

  • Child scraper are dynamically created and supervised by a dynamic supervisor.
  • At some point in the processing the scraped API throws a bandwidth limit error.
  • Several children crash (which is now fixed but this is not the purpose of this post)

Expected behaviour:

  • supervisor restart children

Seen behaviour:

  • the supervisor is restarted, loosing track of all started children

I ve been able to reproduce the problem on a short code snippet here:

Of course, maybe I m missing something about supervisor, but it seems that, to use the words of the original post, it can happen in real life.

If you have an idea … : )

Thanks for your time and fantastic work on elixir ! Long life to the king ! : )

#2

You could just start another supervisor under your dynamic one and put your working task under that supervisor (and not the dynamic one). Basically add one more layer.

#3

@melmoth assuming that dynamic supervisor child is a GenServer which has default restart_strategy of permanent and if supervisor fails to start the child in max_attempts, it would lead to restart of supervisor which could be happening.

As you already mentioned, errors are being sorted and reduced so above should not happen frequently but you can try using use GenServer, restart: :transient

Or use restart: :temporary if that suits though I doubt :slight_smile:

Refer more details on restart option here

Though assumption is Dynamic Supervisor child is a GenServer but applies to any child which is failing frequently when attempted to restart

Edit: just saw the project with minimal reproduction and it does validate some of my hypothesis :wink:

#4

Hi guys,

thanks for passing by :slight_smile:

Actually, something I haven’t understood was that the max_attempts counter / time unit is set on the whole dynamic supervisor and not atomically per child. As several Child could fail at the same time, the counter would clearly be reached. So definitely I was missing something, stupid me ! :wink:

The solution I came to, as the child creation depends on another process ChildMgr, is to monitor the dynamic supervisor from ChildMgr so that each Child is re-created on dyn supervisor restart. See the fix branch

I like the your solution @dimitarvp, it looks more straightforward than mine. The only question I could have is related to the child lifecycle, at some point I will have to terminate the child, I just need to make sure, the static supervisor can be terminated properly. I’ll definitely take it in consideration.

@pikender following that, your restart option is also a must have. The :permanent is not what I want in this context.

Again thanks for your time !!

Fred