How Supervisor Behavior in Elixir Umbrella Apps: Managing Failures Across Applications

In Elixir umbrella apps, when an error occurs in one application (e.g., appA) and its supervisor determines that the error cannot be recovered, the supervisor might terminate the entire umbrella application, affecting not only appA but also appB. I want to know how supervisors work within umbrella apps, their fault-tolerance strategies, and how to handle such scenarios effectively.

I check the supervision tree with wxwidget there are two seperate supervisor. But why behaving like killing both apps in my case.

Any link or explanation would be appreciate.

By default if an OTP application crashes, the BEAM will exit. This means your application supervision tree has crashed or failed to start.

By default there is no restarting at the app level, and restart typically needs to be handled outside of the VM, however you can look at Shoehorn from the nerves project which provides app crash recovery within the beam using a different boot setup and an app crash handler module. It also allows you to specify the startup order which can prioritise starting certain apps so they are available sooner.

1 Like

As mentioned there is no supervision for or of applications. If the root process of an application terminates the application will terminate. There’s neither an attempt made to restart the application, nor to restart the root process (the pid returned from Application.start/2 is the root process).

What happens as a result of the application terminating depends on its restart type. If :permanent the whole vm will shut down, with :transient the same will happen only if the exit reason of the root process wasn’t :normal otherwise it behaves like :temporary, which means the application termination is reported, but nothing else happens. Application — Elixir v1.15.5

The default for your dependencies and all your apps in an umbrella is :permanent, but you can customize the type in the release settings.

Shoehorn builds on top of those things by defaulting all applications to :temporary unless explicitly mentioned as an :init application, making them permanent and ordering them to start as early as possible.

It also integrates with the application termination reporting to allow you to restart applications. I’d however strongly suggest to exhaust other solutions before going with that one.

2 Likes

Thanks to your help I found out this statement at elixir document.

As a starting point, let’s define a release that includes both :kv_server and :kv applications. We will also add a version to it. Open up the mix.exs in the umbrella root and add inside def project:

releases: [
  foo: [
    version: "0.0.1",
    applications: [kv_server: :permanent, kv: :permanent]
  ]
]

That defines a release named foo with both kv_server and kv applications. Their mode is set to :permanent, which means that, if those applications crash, the whole node terminates. That’s reasonable since those applications are essential to our system.

I think it is important to note that an umbrella is a project layout solution and unrelated to how the program actually runs.

2 Likes