Flame_k8s_backend - a FLAME Backend for Kubernetes

I’m working on the FLAME Backend for Kubernetes.

In my current PR, I’d like to allow for better control over the runner pod manifest. The current approach basically offers 2 ways of controlling the runner pod manifest.

In the simpler case you can just define env vars and resource requests/limits for the runner pods. The FLAME backend then creates the runner pod with these values set.

If you need more advanced features like pod affinity (e.g. running on GPU nodes), volumes etc, you can implement a callback in which you build the runner pod manifest in your application and return it to the FLAME backend. The backend then adds soem required env variables, set/overwrite a few values like the pod name, container image, etc. and finally apply it to the cluster to create the runner pod.

Inputs anyone?

16 Likes

No feedback at this point, but just wanted to say thank you! I was hoping this would happen quickly!

4 Likes

Very cool :blush:

Have you thought about using the cluster certificate to verify the tls connections?

Libcluster does this and it should be relatively simple to adapt: https://github.com/bitwalker/libcluster/blob/3f1afbdb9ec0929ed99d35e5f875e55f8cbdd851/lib/strategy/kubernetes.ex#L318

1 Like

I am doing that in the connect function: flame_k8s_backend/lib/flame_k8s_backend/k8s_client.ex at main · mruoss/flame_k8s_backend · GitHub

Or is it something else you’re referring to?

1 Like

Oh, I take it back. I misread the code. Why are you even offering insecure_skip_tls_verify then?

If the URL in the SA token is an IP (not a FQDN), hostname verification fails with :verify_peer as there is no way to verify the hostname in the cert. This is the case e.g. on my local Kind cluster…

1 Like

I am very open to better ways of dealing with this, though.

1 Like

So… I’ve created a PR that removes the insecure_skip_tls_verify option in favour of setting server_name_indication to :disable if KUBERNETES_SERVICE_HOST is an IP address (instead of a FQDN).

However, I’d really like a “security audit” on this. I think this is as safe as it can be. I mean… no SNI, no hostname check. So we might as well disable it automatically, no?

Then again, I was surprised to see even AKS (Azure) settting KUBERNETES_SERVICE_HOST to a FQDN if and only if you add an annotation to your pod!

Maybe I should do something like Erlang does for verify_none: Keep the option in place, but if it is not set and I’m setting server_name_indication to :disable, print a warning.

Opinions anybody?

Here’s the PR:

1 Like

I’m not sure about the perfect solution here either. I’ll message @voltone in the ErlEF security WG if he has any suggestions.

1 Like

Setting server_name_indication: :disable not only drops te SNI extension from the Client Hello message sent to the server, it also disables hostname verification altogether. So while the client still checks if the server is presenting a certificate that was issued by a trusted CA, it does not check if we have reached the server we intended to reach. That’s arguably better than verify: :verify_none, but I think we can do better still?

What identities does Kubernetes put in the certificate that the server presents, in the Common Name field of the Subject and in the SubjectAltNames extension? If the IP address appears anywhere and you connect with an IP address in the URL, then the default behavior of ssl (without :server_name_indication option) should be to try and match that IP.

One way to check what identities are being checked would be to pass the following :ssl option:
customize_hostname_check: [match_fun: fn a, b -> IO.inspect({a, b}); :default end]

Unfortunately I’m not fluent in Erlang. But I think this is actually a bug in Erlang’s public_key:pkix_verify_hostname/N function.

The certificate presented by the Kubernetes API Server contains the IP address (see the note on the Kubernetes docs.

I can verify that, looking at the id-ce-subjectAltName extension in the certificate:

{
  :OTPCertificate, 
  {
    :OTPTBSCertificate, 
     #…,
    [
     #…,
    {
      :Extension, 
      {2, 5, 29, 17}, 
      false, 
      [
        dNSName: ~c"localhost", 
        dNSName: ~c"hcp-kubernetes", 
        dNSName: ~c"kubernetes", 
        dNSName: ~c"kubernetes.default", 
        dNSName: ~c"kubernetes.default.svc", 
        dNSName: ~c"kubernetes.default.svc.cluster.local", 
        iPAddress: <<10, 0, 0, 1>>
      ]
    }
  }
}

Now the IP Address seems to be a binary. But looking at the Erlang code, I think it’s expecting a charlist, no? length() and list_to_tuple() are list operations, no?

This has been bugging me for so long now (I’m also maintaining the k8s library). If this could be fixed, it would be awesome. WDYT @voltone? I can also open an Erlang issue for this.

EDIT: Opened an issue: `public_key:pkix_verify_hostname/N` returns `{:bad_cert, :hostname_check_failed}` when connecting to IP addresses · Issue #7968 · erlang/otp · GitHub

That’s the correct encoding of an IPv4 address according to the X.509 spec. It gets decoded to a 4-tuple elsewhere during hostname verification.

So it seems :ssl treats a string/charlist value in the first argument of :ssl.connect/3 as a hostname and tries to match it against the hostnames in the certificate. So unless the IP address also appears as dNSName: ~c"10.0.0.1" it is not going to match. If you call :ssl.connect/3 with a tuple as the first argument (e.g. {10, 0, 0, 1}) everything works as expected.

Now, you are not calling :ssl.connect/3 directly, your HTTP client library parses the URL and handles the connection establishment, so you can’t pass a tuple. Unless you want to propose upstream changes to the way the TLS connection is established when a URL has an IP address instead of a hostname, you could add the mapping to the hostname verification:

  def custom_hostname_check({:dns_id, hostname}, {:iPAddress, ip} do
    case :inet.parse_address(hostname) do
       {:ok, ^ip} -> true
       _ -> :default
     end
  end
  def custom_hostname_check(_, _), do: :default

And then select this function by passing customize_hostname_check: [match_fun: &custom_hostname_check/1].

What rabbit hole did I get into?! :smiley:

If you call :ssl.connect/3 with a tuple as the first argument (e.g. {10, 0, 0, 1} ) everything works as expected.

True. That works.

you could add the mapping to the hostname verification

I see what you mean. Although this function won’t work as :inet.parse_address(hostname) will return {10, 0, 0, 1} and ^ip is [10, 0, 0, 1]. But that’s solvable, e.g. like this:

def custom_hostname_check(refId, presId) do
  with {{:dns_id, hostname}, {:iPAddress, ip}} <- {refId, presId},
       {:ok, ip_tuple} <- :inet.parse_address(hostname),
       ^ip <- Tuple.to_list(ip_tuple) do
    true
  else
    _ ->
      :default
  end
end

However, I still think this should be “fixed” in some lower abstraction. Maybe Mint? or even lower? I mean… OTP does accept IP addresses as charlists after all, no?

You could try and open an issue against OTP, arguing that ssl:connect/3 should recognise a binary/string representation of an IP address and handle it the same way as a tuple.

If that gets rejected you could try Mint instead. In that case I will probably be asked to review the issue :slight_smile:

Done. See `public_key:pkix_verify_hostname/N` returns `{:bad_cert, :hostname_check_failed}` when connecting to IP addresses · Issue #7968 · erlang/otp · GitHub

Have the change working on a local branch. If the OTP PR gets rejected or… ignored… I will push that. In any case. I’ll link this discussion and vice versa. :slight_smile:

I have implemented the workaround in flame_k8s_backend for now until this issue is fixed in one of the lower layers. GH issues are open on OTP and Mint. Thanks a lot @voltone for your help and let’s take this discussion to GitHub now.

And thanks @maennchen for raising this! :smiley:

2 Likes

I just wanted to say thanks for creating this backend. I’ve been playing with it for a few days and it has been working great.

1 Like