Ecto SSL verification with Google SQL - 'connection requires a valid client certificate'

Hi,

I’m trying to set up client SSL validation with Google Cloud SQL, and things are not going well for multiple reasons.

First off, I was unable to get verify_peer mode working.

2022-08-02T18:42:38.029 app[7047257c] lhr [info] 18:42:38.028 [notice] TLS :client: In state :wait_cert at ssl_handshake.erl:2075 generated CLIENT ALERT: Fatal - Handshake Failure
2022-08-02T18:42:38.029 app[7047257c] lhr [info] - {:bad_cert, :hostname_check_failed}

I tried to add custom verify_fun: &:ssl_verify_hostname.verify_fun/3, through {:ssl_verify_fun, "~> 1.1"} but it falls over due to the bad arity:


2022-08-02T18:39:18.364 app[400cc581] lhr [info] 2022/08/02 18:39:18 listening on [fdaa:0:6924:a7b:276d:1:5d29:2]:22 (DNS: [fdaa::3]:53)
2022-08-02T18:39:20.347 app[400cc581] lhr [info] Reaped child process with pid: 567, exit code: 0
2022-08-02T18:39:23.215 app[400cc581] lhr [info] 18:39:23.214 [notice] TLS :client: In state :wait_cert at ssl_handshake.erl:362 generated CLIENT ALERT: Fatal - Internal Error
2022-08-02T18:39:23.215 app[400cc581] lhr [info] - {:unexpected_error,
2022-08-02T18:39:23.215 app[400cc581] lhr [info] {:badarity,
2022-08-02T18:39:23.215 app[400cc581] lhr [info] {&:ssl_verify_hostname.verify_fun/3, [[bad_cert: :hostname_check_failed]]}}}
2022-08-02T18:39:23.216 app[400cc581] lhr [info] 18:39:23.214 [notice] TLS :client: In state :wait_cert at ssl_handshake.erl:362 generated CLIENT ALERT: Fatal - Internal Error

so I ended up with verify: :verify_none, which works fine with OTP 24.

Next, upgrading to OTP 25 breaks things even further - with that, Ecto can’t find a client certificate.

2022-08-02T19:03:30.027 app[f5a13d3f] lhr [info] 19:03:30.027 [error] Postgrex.Protocol (#PID<0.1965.0>) failed to connect: ** (Postgrex.Error) FATAL 28000 (invalid_authorization_specification) connection requires a valid client certificate

My config:

 config :my_app, MyApp.Repo,
      ssl: true,
      ssl_opts: [
      verify: :verify_none,
        cacertfile: "/.../server-ca.pem",
        keyfile: "/.../client-key.pem",
        certfile: "/.../client-cert.pem"
      ]

In a similar security posture, post a credit card number here to get faster debugging help :stuck_out_tongue:

The symptoms you’re describing make me wonder if Erlang has the right certs in its trust root.

1 Like

I can drop a Monero wallet address if that counts :stuck_out_tongue_winking_eye:

Hm, maybe you’re right

Screenshot 2022-08-02 at 20.30.16

I’ve assumed that self-signed certificate should be handled by cacertfile option.

I have the same issue attempting to run verification: verify_peer and cacert file with cockroachdb’s cacert. Have yet to determine actual solution. I suspect we share the same problem and are looking for the same solution.

1 Like

There is some benefit to using TLS even with verify: :verify_none: a passive attacker who can only monitor, but not modify, network traffic will not be able to decode the traffic exchanged with the server.

Verification of the server certificate is necessary to protect against an active attacker. In a closed network environment like GCP some might argue that the risk of an active attacker is somewhat reduced, compared to connections that traverse the public internet.

If the server has a self-signed certificate, the certificate itself can’t be used to establish trust, in the same way that a CA-issued certificate can. But as long as the server continues to present the same certificate, or use the same key pair, you can ‘pin’ the certificate/public key and abort the connection if a different cert/key is used (which then requires manual verification to see if the change is legit or someone is trying to interfere).

The ssl_verify_fun package README has instructions on how to enable certificate or public key pinning. In Elixir that would look something like:

:ssl.connect('self-signed.badssl.com', 443,
  verify_fun: {&:ssl_verify_fingerprint.verify_fun/3, [check_fingerprint: {:sha, "36F81BA2C9B18032D8B7BC61B26F22F6086DAA95"}]},
  verify: :verify_none,
  reuse_sessions: false
)

(It doesn’t really matter here whether you pass verify: :verify_none or verify: :verify_peer - the verify_fun option overrides the default behaviors selected by that option anyway, but in recent OTP versions you’ll get a warning if you don’t set :verify option at all)

(Also note that you probably wouldn’t want to use reuse_sessions: false in production, but it is necessary while testing otherwise a full handshake may not be performed and your changes to the connection options won’t take effect, leading to surprising results)

4 Likes

Thank @voltone . I’ll try this. Great blog you got.

1 Like

I’ve been able to fix this error, occurring when attempting to connect to a Google Cloud SQL instance by restricting the versions options in ssl_opts to [:"tlsv1.2"].
While debugging I noticed an initial handshake attempt over TLS v1.3, then downgrading to a handshake over TLS v1.2 and then an error being returned. After restricting to TLS v1.2 only, the initial handshake attempt is successful and a connection is established afterwards.
NB: Google Cloud SQL supports only TLS versions 1.0, 1.1 and 1.2.

Amazing, thanks for the shout out.

Are you using verify: :verify_peer with that?

Hitting the same problem with Amazon RDS. Will try your fix too.

Nope, I was able to make it work with verify: :verify_none, and :keyfile, cacertfile, certfile properly set.

Are you on an OTP version less than 25?

In my libs (using the advice from @voltone) I’m configuring TLS 1.2 and 1.3 on OTP 25 and only 1.2 on earlier OTP versions. That seems to have stabilised this particular issue (not just on GCP, also on Github actions)

So I managed to get it working. First off, looking at the Erlang (R25) code this is wrong:

verify_fun: &:ssl_verify_hostname.verify_fun/3,

This works:

verify_fun: {&:ssl_verify_hostname.verify_fun/3, []},

I still have hostname validation errors, but in my case i don’t really care about that. So I added customize_hostname_check to make the (working) configuration look like:

[
    verify: :verify_peer,
    cacertfile: System.get_env("DB_SSL_CACERTFILE"),
    customize_hostname_check: [match_fun: fn(_, _) -> true end],
    verify_fun: {&:ssl_verify_hostname.verify_fun/3, []}
]

Without hostname verification, the only requirement for certificate verification to succeed is that the server presents a certificate that was issued by a trusted CA. Any certificate will do. So if your CA trust store is a common list of public CAs, any certificate issued by e.g. Let’s Encrypt for a random hostname will do. So in terms of stopping an active attacker from impersonating your DB or performing a MitM attack, this is not really any better than verify: :verify_none.

You may have to explicitly set the hostname that :ssl_verify_hostname should expect to find in the certificate: depending on the DB driver and wire protocol, the TLS handshake may be started on an already established TCP socket, in which case the :ssl module doesn’t know the hostname that was used to establish that connection. Try setting verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: 'db.host.name']}

Using Elixir 1.13.4 (image) which in turn relies on Erlang 25.0.4 (image).

Ended up with:

    customize_hostname_check: [match_fun:  fn(_ip, {_, dns_name}) -> dns_name == to_charlist(db_hostname) end],
    verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: to_charlist(db_hostname)]}

Perhaps relevant for future readers of this thread, I have released a package to help enable TLS and server certificate verification with AWS RDS: Aws_rds_castore - Certificate validation for AWS RDS DBs

1 Like