Should I concern crash of Registry?

chulkilee · July 3, 2019, 3:38am

I’m using DynamicSupervisor with Registry like this:

MyApp.Supervisor # one_for_one
  MyApp.Registry
  MyApp.WorkerDynamicSupervisor # DynamicSupervisor, one_for_one
    MyApp.WorkerServer
    ...

If I need to handle Registry process crashing, I need to put registry process and dynamic supervisor tree under one supervisor, and use all_for_one since they should be restarted together.

MyApp.Supervisor # one_for_one
  MyApp.WorkerSupervisor # all_for_one
    MyApp.WorkerRegistry
    MyApp.WorkerDynamicSupervisor # DynamicSupervisor, one_for_one
      MyApp.WorkerServer
      ...

If I add another dynamic supervisor - then it requires maintaining a registry per one superivsion tree

MyApp.Supervisor # one_for_one
  MyApp.WorkerSupervisor # all_for_one
    MyApp.WorkerRegistry
    MyApp.WorkerDynamicSupervisor # DynamicSupervisor, one_for_one
      MyApp.WorkerServer
      ...
  MyApp.AnotherSupervisor
    MyApp.AnotherRegistry
    MyApp.AnotherDynamicSupervisor

From Process communication in a dynamic supervision setup - one Registry per application may be enough - but should I care about registry process crashing?

Also under what circumstance a process crashes, even its code does not have a “bug”?

Fl4m3Ph03n1x · July 3, 2019, 2:44pm

I understand from your post that you are using Elixir’s Registry and that you want to know if you should worry about the process that is holding your Registry crashes.

IMO, if you want to build a reliable application that can endure the horrible outside world without people calling you at 3 AM, you should at least consider the possibility of that process crashing and account for it. Good Elixir application are not about making your code impervious to crashes, they are about being able to resurface in a stable manner once said crashes occur.

This said, assuming your code has no bugs whatsoever, a lot of things can make a process crash. Maybe you reached the maximum number of atoms, your ETS tables ate your entire RAM, the process message queue got flooded and it ate all the available RAM, the machine’s disk got full and simply errored out due to a power surge resulting in corrupted data that causes BEAM to behave erratically, etc…

Worst case scenario a meteor falls on your company and pulverizes half the servers. There are many reasons for a process to crash, you should instead focus on how you can have a clean recovery once such happens.

Hope it helps!

dimitarvp · July 3, 2019, 2:48pm

That really depends what do you use Registry for. If it’s a transient cache then you should have code in place that detects data disappearing from it and making sure to re-fill it. If you want to use it like any persistent K/V store (a la Redis, BoltDB, LevelDB, Cassandra etc.) then I don’t think Registry is a good fit.

Maybe I misunderstand your post but it seems like you are worried about if OTP works? Trust me it does.

Fl4m3Ph03n1x · July 3, 2019, 2:52pm

When I use ETS tables and Registry, I usually have a process that is responsible for holding them. Should such a process crash, the Application Supervisor (which is the parent of the ets/registry holder process) simply creates a new process and refills the table / registry.

There are many ways to make sure it is very hard to lose the data on your ETS / Registry, as shown in the article bellow, but from personal experience, I have never seen such processes crash and I have never seen my Supervisors having to re-populate said tables and registries. We get thousands of requests per second and I have never seen a log error from those. If anything, you can be sure OTP will do it’s job

al2o3cr · July 3, 2019, 3:36pm

One possible situation: supervisors have built-in limits for how often processes they manage can restart (configured via max_restarts and max_seconds). If those limits are exceeded by a child process - for instance, if code in MyApp.WorkerServer has a bug that causes repeated crashes - then the supervisor itself will shut down, taking out all the supervised processes.

chulkilee · July 3, 2019, 4:56pm

Thanks for the comment, but I’d say that’s terminating or killing process, not getting a process “crashed”.

But good point - that’s why we have to get supervision tree correct to isolate crashes.

chulkilee · July 3, 2019, 5:05pm

Thanks @Fl4m3Ph03n1xand @dimitarvp @al2o3cr for your comments!

I trust OTP will work as expected - but the question is whether I need to put “more care” on crashing a process running a “core” Registry module or not.

My take is - to isolate the crash of anything - or handle crash more nicely - it seems better to put Registry under same supervision tree where those registered process will run.

However, it can be tricky if an application want to use single registry for different supervision trees. Running a process is very cheap, so why don’t we run a registry process per supervision tree?

BTW could a moderator fix the typo in the title?

LostKobrakai · July 3, 2019, 5:58pm

I remember @michalmuskala mentioning on slack that a single registry process is usually enough even for a bigger application, but if you like to have more structure I don’t think there speaks much against using multiple ones.