Flame_k8s_backend - a FLAME Backend for Kubernetes

mruoss · December 13, 2023, 7:43pm

I’m working on the FLAME Backend for Kubernetes.

In my current PR, I’d like to allow for better control over the runner pod manifest. The current approach basically offers 2 ways of controlling the runner pod manifest.

In the simpler case you can just define env vars and resource requests/limits for the runner pods. The FLAME backend then creates the runner pod with these values set.

If you need more advanced features like pod affinity (e.g. running on GPU nodes), volumes etc, you can implement a callback in which you build the runner pod manifest in your application and return it to the FLAME backend. The backend then adds soem required env variables, set/overwrite a few values like the pod name, container image, etc. and finally apply it to the cluster to create the runner pod.

Inputs anyone?

github.com/mruoss/flame_k8s_backend

Runner pod templates

mruoss:main ← mruoss:runner-pod-template

opened 07:10PM - 13 Dec 23 UTC

mruoss

+736 -113

## Module Docs of `FLAMEK8sBackend.RunnerPodTemplate` This module is respon…sible for generating the manifest for the runner pods. The manifest can be overridden using the `runner_pod_tpl` option on `FLAMEK8sBackend`. ### Simple Use Case By default, `resources` and `env` variables are copied from the parent pod. Using the `runner_pod_tpl` option on the `FLAMEK8sBackend`, you can add additional environment variables or set different `resources`. You would do this by setting the `runner_pod_tpl` to a struct of type `t:FLAMEK8sBackend.RunnerPodTemplate.t/0` as follows: ```ex # application.ex alias FLAMEK8sBackend.RunnerPodTemplate children = [ {FLAME.Pool, name: MyApp.SamplePool, backend: {FLAMEK8sBackend, runner_pod_tpl: %RunnerPodTemplate{ env: [%{"name" => "FOO", "value" => "bar"}], resources: %{ requests: %{"memory" => "256Mi", "cpu" => "100m"}, limimts: %{"memory" => "256Mi", "cpu" => "400m"} } } }, # other opts } ] # ... ``` ### Advanced Use Case - Passing a Callback Function In some cases you might need advanced control over the runner pod manifest. Maybe you want to set node affinity because you need your runners to run on nodes with GPUs or you need additional volumes etc. In this case, you can pass a callback via `runner_pod_tpl` option to the `FLAMEK8sBackend`. The callback has to be of type `t:FLAMEK8sBackend.RunnerPodTemplate.callback/0`. The callback will be called with the manifest of the parent pod which can be used to extract information. It should return a pod template as a map Define a callback, e.g. in a separate module: ```ex defmodule MyApp.FLAMERunnerPodTemplate do def runner_pod_manifest(parent_pod_manifest) do %{ "metadata" => %{ # namespace, labels, ownerReferences,... }, "spec" => %{ "containers" => [ %{ # container definition } ] } } end end ``` Register the backend: ```ex # application.ex # ... children = [ {FLAME.Pool, name: MyApp.SamplePool, backend: {FLAMEK8sBackend, runner_pod_tpl: &MyApp.FLAMERunnerPodTemplate.runner_pod_manifest/1}, # other opts } ] # ... ``` > #### Predefined Values {: .warning} > > Note that the following values are controlled by the backend and, if set by > your callback function, are going to be overwritten: > > * `apiVersion` and `Kind` of the resource (set to `v1/Pod`) > * The pod's and container's names (set to a combination of the parent pod's > name and a random id) > * The `restartPolicy` (set to `Never`) > * The container `image` (set to the image of the parent pod's app container) ### Options * `:omit_owner_reference` - Omit generating and appending the parent pod as `ownerReference` to the runner pod's metadata * `:app_container_name` - name of the container running this application. By default, the first container in the list of containers is used.

entone · December 13, 2023, 8:54pm

No feedback at this point, but just wanted to say thank you! I was hoping this would happen quickly!

maennchen · December 15, 2023, 10:47pm

Very cool

Have you thought about using the cluster certificate to verify the tls connections?

Libcluster does this and it should be relatively simple to adapt: https://github.com/bitwalker/libcluster/blob/3f1afbdb9ec0929ed99d35e5f875e55f8cbdd851/lib/strategy/kubernetes.ex#L318

mruoss · December 16, 2023, 6:34am

I am doing that in the connect function: flame_k8s_backend/lib/flame_k8s_backend/k8s_client.ex at main · mruoss/flame_k8s_backend · GitHub

Or is it something else you’re referring to?

maennchen · December 16, 2023, 6:52am

Oh, I take it back. I misread the code. Why are you even offering insecure_skip_tls_verify then?

mruoss · December 16, 2023, 7:08am

If the URL in the SA token is an IP (not a FQDN), hostname verification fails with :verify_peer as there is no way to verify the hostname in the cert. This is the case e.g. on my local Kind cluster…

mruoss · December 16, 2023, 8:45am

I am very open to better ways of dealing with this, though.

mruoss · December 16, 2023, 12:45pm

So… I’ve created a PR that removes the insecure_skip_tls_verify option in favour of setting server_name_indication to :disable if KUBERNETES_SERVICE_HOST is an IP address (instead of a FQDN).

However, I’d really like a “security audit” on this. I think this is as safe as it can be. I mean… no SNI, no hostname check. So we might as well disable it automatically, no?

Then again, I was surprised to see even AKS (Azure) settting KUBERNETES_SERVICE_HOST to a FQDN if and only if you add an annotation to your pod!

Maybe I should do something like Erlang does for verify_none: Keep the option in place, but if it is not set and I’m setting server_name_indication to :disable, print a warning.

Opinions anybody?

Here’s the PR:

maennchen · December 16, 2023, 2:00pm

I’m not sure about the perfect solution here either. I’ll message @voltone in the ErlEF security WG if he has any suggestions.

voltone · December 18, 2023, 9:35am

Setting server_name_indication: :disable not only drops te SNI extension from the Client Hello message sent to the server, it also disables hostname verification altogether. So while the client still checks if the server is presenting a certificate that was issued by a trusted CA, it does not check if we have reached the server we intended to reach. That’s arguably better than verify: :verify_none, but I think we can do better still?

What identities does Kubernetes put in the certificate that the server presents, in the Common Name field of the Subject and in the SubjectAltNames extension? If the IP address appears anywhere and you connect with an IP address in the URL, then the default behavior of ssl (without :server_name_indication option) should be to try and match that IP.

One way to check what identities are being checked would be to pass the following :ssl option:
customize_hostname_check: [match_fun: fn a, b -> IO.inspect({a, b}); :default end]

mruoss · December 18, 2023, 10:34am

Unfortunately I’m not fluent in Erlang. But I think this is actually a bug in Erlang’s public_key:pkix_verify_hostname/N function.

The certificate presented by the Kubernetes API Server contains the IP address (see the note on the Kubernetes docs.

I can verify that, looking at the id-ce-subjectAltName extension in the certificate:

{
  :OTPCertificate, 
  {
    :OTPTBSCertificate, 
     #…,
    [
     #…,
    {
      :Extension, 
      {2, 5, 29, 17}, 
      false, 
      [
        dNSName: ~c"localhost", 
        dNSName: ~c"hcp-kubernetes", 
        dNSName: ~c"kubernetes", 
        dNSName: ~c"kubernetes.default", 
        dNSName: ~c"kubernetes.default.svc", 
        dNSName: ~c"kubernetes.default.svc.cluster.local", 
        iPAddress: <<10, 0, 0, 1>>
      ]
    }
  }
}

Now the IP Address seems to be a binary. But looking at the Erlang code, I think it’s expecting a charlist, no? length() and list_to_tuple() are list operations, no?

This has been bugging me for so long now (I’m also maintaining the k8s library). If this could be fixed, it would be awesome. WDYT @voltone? I can also open an Erlang issue for this.

EDIT: Opened an issue: `public_key:pkix_verify_hostname/N` returns `{:bad_cert, :hostname_check_failed}` when connecting to IP addresses · Issue #7968 · erlang/otp · GitHub

voltone · December 18, 2023, 1:42pm

That’s the correct encoding of an IPv4 address according to the X.509 spec. It gets decoded to a 4-tuple elsewhere during hostname verification.

So it seems :ssl treats a string/charlist value in the first argument of :ssl.connect/3 as a hostname and tries to match it against the hostnames in the certificate. So unless the IP address also appears as dNSName: ~c"10.0.0.1" it is not going to match. If you call :ssl.connect/3 with a tuple as the first argument (e.g. {10, 0, 0, 1}) everything works as expected.

Now, you are not calling :ssl.connect/3 directly, your HTTP client library parses the URL and handles the connection establishment, so you can’t pass a tuple. Unless you want to propose upstream changes to the way the TLS connection is established when a URL has an IP address instead of a hostname, you could add the mapping to the hostname verification:

  def custom_hostname_check({:dns_id, hostname}, {:iPAddress, ip} do
    case :inet.parse_address(hostname) do
       {:ok, ^ip} -> true
       _ -> :default
     end
  end
  def custom_hostname_check(_, _), do: :default

And then select this function by passing customize_hostname_check: [match_fun: &custom_hostname_check/1].

mruoss · December 18, 2023, 2:11pm

What rabbit hole did I get into?!

If you call :ssl.connect/3 with a tuple as the first argument (e.g. {10, 0, 0, 1} ) everything works as expected.

True. That works.

you could add the mapping to the hostname verification

I see what you mean. Although this function won’t work as :inet.parse_address(hostname) will return {10, 0, 0, 1} and ^ip is [10, 0, 0, 1]. But that’s solvable, e.g. like this:

def custom_hostname_check(refId, presId) do
  with {{:dns_id, hostname}, {:iPAddress, ip}} <- {refId, presId},
       {:ok, ip_tuple} <- :inet.parse_address(hostname),
       ^ip <- Tuple.to_list(ip_tuple) do
    true
  else
    _ ->
      :default
  end
end

However, I still think this should be “fixed” in some lower abstraction. Maybe Mint? or even lower? I mean… OTP does accept IP addresses as charlists after all, no?

voltone · December 18, 2023, 8:37pm

You could try and open an issue against OTP, arguing that ssl:connect/3 should recognise a binary/string representation of an IP address and handle it the same way as a tuple.

If that gets rejected you could try Mint instead. In that case I will probably be asked to review the issue

mruoss · December 18, 2023, 9:13pm

Done. See `public_key:pkix_verify_hostname/N` returns `{:bad_cert, :hostname_check_failed}` when connecting to IP addresses · Issue #7968 · erlang/otp · GitHub

Have the change working on a local branch. If the OTP PR gets rejected or… ignored… I will push that. In any case. I’ll link this discussion and vice versa.

mruoss · December 19, 2023, 4:21pm

I have implemented the workaround in flame_k8s_backend for now until this issue is fixed in one of the lower layers. GH issues are open on OTP and Mint. Thanks a lot @voltone for your help and let’s take this discussion to GitHub now.

And thanks @maennchen for raising this!

deevil · December 20, 2023, 6:38pm

I just wanted to say thanks for creating this backend. I’ve been playing with it for a few days and it has been working great.

mruoss · July 8, 2024, 8:27pm

In version 0.4.1 released yesterday I have removed Req - the last dependency besides Flame. Now FLAME can safely be used in Livebooks running on Kubernetes.

fram · August 13, 2024, 4:38pm

I have a question related to the startup times of new pods. I was reading the original article for Flame and starting up and connect a new machine takes about 3 seconds on Fly.io infrastructure.

Did anybody tried Flame with K8s backend on an AWS EKS cluster? How fast is the process of starting and connecting a new node?

D4no0 · August 13, 2024, 4:40pm

I think you need to try it out yourself.

I don’t think there is a hard guarantee on what kind of hardware/orchestration they use behind the scenes, so it might be unpredictable based on the region, hardware, software versions they use.