Readiness/liveness issues

I have a catch-22 issue that I’m struggling somewhat with and could use some input/experiences from others.

I’m developing a stocklevel tracking service that uses mnesia and which gets deployed to a kubernetes cluster. My issue is that new nodes starting up will not get connected to the already connected nodes if I use the kubernetes readiness property - however, as part of starting up I want a node to get a copy of the mnesia tables, ram copy only. I’m not marking it ready until it has connected to the clustered nodes and received it’s copy.

Does anyone have experience using either libcluster or peerage (which is what I’m using currently) and connecting nodes before kubernetes sees the new node as ready?

Hey @Fake51, here’s what I do.

Here is my health controller:

defmodule Sensetra.Web.HealthController do
  use Phoenix.Controller, log: false

  def alive(conn, _params) do
    json(conn, %{alive: true})
  end

  def ready(conn, _params) do
    if Application.get_env(:sensetra, :ready) do
      json(conn, %{ready: true})
    else
      send_resp(conn, 503, "")
    end
  end
end

Notice how the alive endpoint always returns true, but the ready endpoint is conditional on an application environment setting. In my config, that value defaults to false. Then, I just have a tiny genserver as the very last entry in my supervision tree that sets the value to true. This ensures that the application doesn’t indicate that it is ready until the entire supervision tree has booted.

4 Likes

Thanks for answering :slight_smile: How are you using this in terms of deployment? Are you deploying to Kubernetes and running a http check against alive/ready with readiness or liveness?

My aim is to allow a loadbalancer to check the health of the node before sending traffic to it, but it just looks like I can’t get the node into the cluster without also allowing traffic to it - i.e. the kubernetes won’t let it join the cluster before it reports ready, but it won’t be ready until it joins.

Ah yeah so that depends a bit on how you have your ingress and services configured. I have a liveliness and a readiness check both configured on the deployment, to the /live and /ready routes specifically. I’m on AWS using AWS ALB ingresses and had to use the node port type instead of ip because of the reasons you outlined. IIRC there was work being done to ensure a smoother rollover with the IP mode but I haven’t checked on that in a while.

I found a solution to my issue, it seems. it’s documented but poorly.

My setup is using a headless service in kubernetes - that handles the DNS, so the pods get IPs of each other by looking up their own hostnames. That’s bog standard and easy. However, the headless service doesn’t publish the IP of pods that don’t show as ready, unless you specifically add publishNotReadyAddresses to the service config. Set that to true and it works.

Thanks for the help, benwilson512!

1 Like