How to connect nodes in Kubernetes before application startup?

janp · September 4, 2024, 4:31pm

I am wondering whether it is possible to use distributed Erlang plus libcluster in kubernetes?

I tried it with the following setup:

StatefulSet (with 3 replicas)

example@example-0.example-headless.default.svc.cluster.local
example@example-1.example-headless.default.svc.cluster.local
example@example-2.example-headless.default.svc.cluster.local

Cluster.Strategy.Kubernetes

kubernetes_ip_lookup_mode: :pods
mode: :hostname
kubernetes_namespace: default
kubernetes_selector: app=app
kubernetes_service_name: example-headless

config/runtime.exs

config :kernel,
  sync_nodes_optional: [
    :"example@example@example-0.app-headless.default.svc.cluster.local",
    :"example@example@example-1.app-headless.default.svc.cluster.local",
    :"example@example@example-2.app-headless.default.svc.cluster.local"
  ],
  sync_nodes_timeout: 5000

mix.exs

def releases do
  [
    example: [
     reboot_system_after_config: true,
     # ...
    ]
  ]
end

My problem with this setup is, that the nodes cannot connect to each other before starting the application, because their DNS records will only be available when the pods are Ready, which requires the readinessProbe (GET /health/readyz) to be successful, but the readyinessProbe already requires the application to be started

So a chicken and egg problem, but maybe I am just doing something wrong

Currently, when all pods are started after exceeding the sync_nodes_timeout, the application is running in a cluster, but I would like to have the nodes being connected with each other before the application gets started.

mudasobwa · September 4, 2024, 7:52pm

I did not check it in Kubernetes, but my cloister does essentially this in AWS ECS.

The idea behind is simple: the library declares the application that connects nodes in its start_phase/3 callback.

You might do the same in your app’s start_phase/3 with libcluster, which would postpone application’s start/2 to return until the nodes are connected.

janp · September 5, 2024, 7:29am

@mudasobwa Thanks for your answer, but I think I found the issue with my setup

TLDR;

changed from Cluster.Strategy.Kubernetes to Cluster.Strategy.Kubernetes.DNSSRV
set publishNotReadyAddresses: true for the headless service
set podManagementPolicy: Parallel for the StatefulSet

$ kubectl explain service.spec.publishNotReadyAddresses
KIND:     Service
VERSION:  v1

FIELD:    publishNotReadyAddresses <boolean>

DESCRIPTION:
     publishNotReadyAddresses indicates that any agent which deals with
     endpoints for this Service should disregard any indications of
     ready/not-ready. The primary use case for setting this field is for a
     StatefulSet's Headless Service to propagate SRV DNS records for its Pods
     for the purpose of peer discovery. The Kubernetes controllers that generate
     Endpoints and EndpointSlice resources for Services interpret this to mean
     that all endpoints are considered "ready" even if the Pods themselves are
     not. Agents which consume only Kubernetes generated endpoints through the
     Endpoints or EndpointSlice resources can safely assume this behavior.

Setting publishNotReadyAddresses: true allows the pods to find each other via the headless-service before they are ready.

As mentioned in the explaination for the publishNotReadyAddresses option, I also switched from Cluster.Strategy.Kubernetes to Cluster.Strategy.Kubernetes.DNSSRV.

Additionally I set podManagementPolicy to Parallel which allows the StatefulSet to schedule all pods in parallel. The default is OrderedReady which requires one pod to be ready before Kubernetes schedules the next pod, so they are only started one after the other.

Now my application is clustered and started within 15 seconds