Supervisors in a container/pod world

The product I work on leans heavily on Kubernetes. We typically expect OS processes to crash and cause pods to be restarted and that hooks into all sorts of monitoring and alerting.

Our Elixir app starts a lot of supervised processes. If one of those processes crashes repeatedly, the supervisor just gives up on trying to restart it, but the OS process stays up. Thus Kubernetes won’t restart the pod.

I know there is a way to tell a supervisor to try to restart the process indefinitely, but is there a way to say “if this process can’t be restarted, exit the entire OS process?”

Or is there a better way to think about this problem? OTP is great for a lot of things, but I feel like modern container orchestration frameworks kinda supersede some features.

Thanks for the help!

In you Application.start/2 callback you usually return a pid. If that pid stops running (and your application is declared permanent, which is the default) then the application will shut down and issue a shutdown for the whole VM.

So if you‘re not seeing failing recovery of supervisors bubble up to that root pid (and stop the VM), you probably haven‘t setup your supervision tree correctly. You‘d want all processes to be children of the root supervisor or nested supervisors beneigh it, if errors are supposed to bubble up to the application level.

Hmm, good hint. Now that you’ve said that, I kinda feel the culprit is that our pods basically run something like mix run --no-halt… ?

That would be it it seems.

Maybe superfluous but another way to handle this on kubernetes could be liveness probes. See something like this: GitHub - joeapearson/elixir-k8s-probe: Provides configurable HTTP liveness and readiness endpoints for Elixir applications, intended to be used to support Kubernetes probes.

You define in code what needs to be running otherwise return an error code and kubernetes will restart your pod(s). The upside of this method is that you have more control (from elixir source) before you let it restart.

I see the k8s orchestration as an extension (and mostly last resort) to what OTP gives you if everything else within your app fails to recover.

1 Like

Oofa, so is there any way to do this?

The liveness probes don’t work in our case because there is no “web” aspect to it. The process in question in just a Postgres CDC processor (kinda like a stream processor).

Our specific problem is that we process large JSON payloads (up to 800 mb). It takes a long time to process that amount of data. It takes several minutes before it crashes. So the supervisor happily restarts the process ad infinitum.

Currently, I made a Prometheus metic for “lag”, i.e. how far behind we are in processing the stream, and then we have Grafana alerts on that.

But like I said before, we’re deep in Kubernetes and already have monitoring and alerts for pod restarts, and it would be nice to just leverage that.

So really I wanna be able to mark some process as critical and if it crashes, just exit the whole OS process. Is this possible??

You could tinker with :max_restarts/:max_seconds on supervisors, but depending on the tree depth that might not be the most efficient way to go about this. You could also put a process elsewhere in the system, which monitors our “critical” process and if it receives an exit message does System.stop(…).

2 Likes

If your task is some kind of batch processing job, and you want to leverage k8s to control it, why do you need an OTP supervisor at all?

Consider the following:

  1. Run your task as a dumb top-level process without a supervisor. The process either succeeds or fails.

  2. Orchestrate the task execution using a k8s job. This allows you to control the maximum number of retries and monitor the task execution at the k8s level.

We want things like Postgres connection pools (Ecto repos), Xandra connection pools, Redis connections pools, etc… to be started under a supervisor. That’s the kinda stuff we want restarted if there is a transient blip in service to any of those things.

So I’m not sure how to say “start up most of our app under the normal supervision tree, but this one process is special and needs to bring down the whole OS process if it fails.”

  1. Run your task as a dumb top-level process without a supervisor. The process either succeeds or fails.

Yes, we want to do as you presented as option 1! But how to start up the rest of the app in that case? Also, I don’t know how to do that. All I know is how to start processes as a part of my application’s supervisor tree, i.e. MyNeatProduct.Application.start/2.

Thanks again.

My 2 points above weren’t meant to be 2 exclusive options, but 2 things to do together, apologies for the confustion.

I think I understand your problem better now.

What about starting your application like this:

defmodule MyApp.Applcation do 

use Application

 def start(_type, _arg) do
    services = [
      MyApp.Repo,
      MyApp.Redix
      # ... other services
    ]

    children = [
      %{
        id: Supervisor,
        start:
          {Supervisor, :start_link,
           [services, [strategy: :one_for_one, name: Services.Supervisor]]},
        type: :supervisor
      },
      %{
        id: BatchProcessor,
        start: {BatchProcessor, :start, []},
        type: :worker
      }
    ]

    Supervisor.start_link(children,
      name: MyApp.Supervisor,
      max_restarts: 0
    )
  end
end

You have a top level supervisor (MyApp.Supervisor) with 2 children: a supervisor (Services.Supervisor)
responsible for starting and managing all the services you want to have automatically restarted upon failure, and a BatchProcessor worker which implements your batch processing job.

Since MyApp.Supervisor is started with max_restarts: 0, as soon as your BatchProcessor exits for any reason, the whole application will exit with reason :shutdown

1 Like

Looks like that it’s maybe time to switch to a NIF-based JSON parser and/or use a streaming one? Options:

IMO when you get to the point of a worker timing out due to having to process huge input, a streaming parser is your only viable option. But maybe jiffy or jsonrs would do the job because they should parse several times faster.

I am not sure OTP can help you a lot here; or if you really want that and don’t want to switch the JSON parsing library then maybe extract out the JSON parsing code into a separate app.

2 Likes

While there is a application supervision tree, it likely works differently than you expect. The pid returned from Application.start/2 callback is never restarted. When it crashes the application is considered failed and stopped. Restarts essentially only happen to children of supervisors you control – there’s no implicit supervisor restarting things at the application level.

So you could probably do a setup like this:

        ┌───────────┐
        │  RootSup  │ Allow no restarts
        └───────────┘
              │
      ┌───────┴──────┐
      │              │
      ▼              ▼
┌ ─ ─ ─ ─ ─ ┐  ┌───────────┐
   ImpProc     │  ApplSup  │ Default restart limits
└ ─ ─ ─ ─ ─ ┘  └───────────┘
                     │
               ┌─────┴────────┐
               │              │
               ▼              ▼
         ┌ ─ ─ ─ ─ ─    ┌ ─ ─ ─ ─ ─
             Rest   │       Rest   │
         └ ─ ─ ─ ─ ─    └ ─ ─ ─ ─ ─

There’s also a :significant setting for OTP, which I’m not sure if you can use it yet with elixir:
https://erlang.org/doc/man/supervisor.html#significant_child

2 Likes

This is exactly what I was suggesting above.

1 Like

Here is a newer NIF written in Rust for JSON parsing for Elixir: GitHub - benhaney/Jsonrs: Rust powered JSON library for Elixir

I haven’t tested it yet but claims to be a drop-in replacement for Jason or Poison but much faster and with less memory usage. It looks like a simple bridge to serde_json.

Edit: Apologies @dimitarvp I just noticed you mentioned jsonrs but I didn’t see the link.

1 Like

It’s fine, at least you added the link. I didn’t. :smiley:

Thank you both for this info. I think this solves my specific problem and I also learned more about OTP and supervisors… :slight_smile:

We already use Jsonrs:slight_smile: In fact, that combined with increasing the pod’s cpu resource got us past that 850 mb transaction. But who knows what the ceiling is. Would a 1gb transaction crash it? I would guess yes, looking at our pod cpu/mem graphs (they’re close to maxing out already)… :grimacing:

The streaming JSON lib idea is interesting. It would def help with memory usage, but wouldn’t really help in responding to heartbeats without some major changes to how the CDC library works, which currently just gives you a callback to process messages (the JSON payloads representing a transaction).