Site Encrypt SSL in staging/prod failing to generate certificate

Hi,

I’m facing an issue generating a SSL certificate in my project using site_encrypt.
In CERT_MODE=local all fine but if I switch to staging or production I can only see the following log :

Creating new account (CA acme-staging-v02.api.letsencrypt.org)
Ordering a new certificate for domain ubiquitous-funicular.ch (CA acme-staging-v02.api.letsencrypt.org)
[…]
[error] Task #PID<0.3036.0> started from #PID<0.2817.0> terminating
** (MatchError) no match of right hand side value: {:error, #SiteEncrypt.Acme.Client.API.Session<https://acme-staging-v02.api.letsencrypt.org/directory>}
(site_encrypt 0.6.0) lib/site_encrypt/acme/client.ex:74: SiteEncrypt.Acme.Client.process_new_order/3
(site_encrypt 0.6.0) lib/site_encrypt/acme/client.ex:45: SiteEncrypt.Acme.Client.create_certificate/2
(site_encrypt 0.6.0) lib/site_encrypt/certification/native.ex:52: SiteEncrypt.Certification.Native.create_certificate/2
(site_encrypt 0.6.0) lib/site_encrypt/certification/job.ex:15: SiteEncrypt.Certification.Job.certify/1
(site_encrypt 0.6.0) lib/site_encrypt/certification/job.ex:26: SiteEncrypt.Certification.Job.certify_and_apply/1
(elixir 1.17.3) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: #Function<0.66470241/0 in SiteEncrypt.Certification.Job.child_spec/1>
Args: []

on the other side, on my instance, I get a successful HTTP 200 answer when :

curl -v https://acme-v02.api.letsencrypt.org/directory
# or
curl -v https://acme-staging-v02.api.letsencrypt.org/directory

Any idea what could be the problem here ?

Can you confirm that your DNS/public interface is configured properly?

Even in staging certificate mode, letsencrypt (by default) will try to reach your server with its specific endpoint convention, so if that is not reachable by a reason or another, it will fail.

DSN and reverse DNS looks fine :

$ nslookup ubiquitous-funicular.ch
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   ubiquitous-funicular.ch
Address: 194.182.188.115
$ dig ubiquitous-funicular.ch

; <<>> DiG 9.20.0-2ubuntu3-Ubuntu <<>> ubiquitous-funicular.ch
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21625
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;ubiquitous-funicular.ch.       IN      A

;; ANSWER SECTION:
ubiquitous-funicular.ch. 2818   IN      A       194.182.188.115

;; AUTHORITY SECTION:
ubiquitous-funicular.ch. 2818   IN      NS      ns12.infomaniak.ch.
ubiquitous-funicular.ch. 2818   IN      NS      ns11.infomaniak.ch.

;; Query time: 1 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Jan 09 12:43:13 UTC 2025
;; MSG SIZE  rcvd: 117
$ dig -x 194.182.188.115

; <<>> DiG 9.20.0-2ubuntu3-Ubuntu <<>> -x 194.182.188.115
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48459
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;115.188.182.194.in-addr.arpa.  IN      PTR

;; ANSWER SECTION:
115.188.182.194.in-addr.arpa. 3600 IN   PTR     ubiquitous-funicular.ch.

;; AUTHORITY SECTION:
188.182.194.in-addr.arpa. 28346 IN      NS      ns2.exoscale.net.
188.182.194.in-addr.arpa. 28346 IN      NS      ns3.exoscale.net.

;; Query time: 6 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Jan 09 12:43:29 UTC 2025
;; MSG SIZE  rcvd: 142

My elastic IP : 194.182.188.115
My instance IP : 159.100.252.34

TCP ports 80, 443 and 4000 open for ingress for all Ips

Is this endpoint reachable?

http://ubiquitous-funicular.ch/.well-known/acme-challenge/<TOKEN>

This is the endpoint letsencrypt will try to call if you use the default challenge type.

I’ve tried to curl to that endpoint and for some reason it hangs (no response, no timeout), are you getting the error in console when doing that?

I get the same result calling this endpoint, it hangs and I don’t see logs about this call.
I didn’t declare any endpoint for this neither done any conf as it’s not describe in site_encrypt documentation

This is what this library should be doing automatically and abstracted away from the user. You can read more about how letsencrypt issues challenges here: Challenge Types - Let's Encrypt

A few more checks:

  1. Are you running any reverse proxies before your server such as nginx?
  2. Is your server running on port 80? As a way to test, try to do the same curl request from the machine where this is deployed using the local ip address, if that works, it’s a network configuration issue (maybe firewall).

No proxies (Nginx, Apache, …)
Server is running HTTP on 80 and HTTPS on 443

curl -v http://localhost/.well-known/acme-challenge/wwww
* Host localhost:80 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:80...
* connect to ::1 port 80 from ::1 port 53606 failed: Connection refused
*   Trying 127.0.0.1:80...
* Connected to localhost (127.0.0.1) port 80
> GET /.well-known/acme-challenge/wwww HTTP/1.1
> Host: localhost
> User-Agent: curl/8.9.1
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 500 Internal Server Error
< cache-control: max-age=0, private, must-revalidate
< content-length: 21
< content-type: text/html; charset=utf-8
< date: Thu, 09 Jan 2025 14:31:54 GMT
< server: Cowboy
< 
* Connection #0 to host localhost left intact

with app logs :

[debug] Converted error RuntimeError to 500 response
[error] #PID<0.4351.0> running UbiquitousFunicularWeb.Endpoint (connection #PID<0.4350.0>, stream id 1) terminated
Server: localhost:80 (http)
Request: GET /.well-known/acme-challenge/wwww
** (exit) an exception was raised:
** (RuntimeError) unknown challenge
(site_encrypt 0.6.0) lib/site_encrypt/certification/native.ex:30: SiteEncrypt.Certification.Native.full_challenge/2
(site_encrypt 0.6.0) lib/site_encrypt/acme_challenge.ex:12: SiteEncrypt.AcmeChallenge.call/2
(ubiquitous_funicular_web 0.1.0) lib/ubiquitous_funicular_web/endpoint.ex:1: UbiquitousFunicularWeb.Endpoint.plug_builder_call/2
(ubiquitous_funicular_web 0.1.0) lib/ubiquitous_funicular_web/endpoint.ex:1: UbiquitousFunicularWeb.Endpoint.call/2
(plug_cowboy 2.7.2) lib/plug/cowboy/handler.ex:11: Plug.Cowboy.Handler.init/2
(cowboy 2.12.0) /home/runner/work/ubiquitous-funicular/ubiquitous-funicular/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2
(cowboy 2.12.0) /home/runner/work/ubiquitous-funicular/ubiquitous-funicular/deps/cowboy/src/cowboy_stream_h.erl:306: :cowboy_stream_h.execute/3
(cowboy 2.12.0) /home/runner/work/ubiquitous-funicular/ubiquitous-funicular/deps/cowboy/src/cowboy_stream_h.erl:295: :cowboy_stream_h.request_process/3

I’m not sure about my SiteEncrypt integration in the endpoint.ex :

plug SiteEncrypt.AcmeChallenge, __MODULE__
plug Plug.SSL, exclude: [“/.well-known/acme-challenge”], rewrite_on: [:x_forwarded_proto]

[…]

@impl SiteEncrypt
def certification do
SiteEncrypt.configure(
client: :native,
domains: [“ubiquitous-funicular.ch”],
emails: [“ubiquitousfunicular@gmail.com”],
db_folder: System.get_env(“SITE_ENCRYPT_DB”, Path.join(“tmp”, “site_encrypt_db”)),
log_level: :debug,
directory_url:
case System.get_env(“CERT_MODE”, “local”) do
“local” → {:internal, port: 4002}
“staging” → “https://acme-staging-v02.api.letsencrypt.org/directory
“production” → “https://acme-v02.api.letsencrypt.org/directory
end
)
end

Not sure if it’s a cause or a consequence of the issue but the health check of my elastic ip regularly pings my server app and it generates these logs :

- :no_suitable_cipher
[notice] TLS :server: In state :start at tls_server_connection_1_3.erl:894 generated SERVER ALERT: Fatal - Insufficient Security

The 500 response is a great sign and the error seems to be sane:

(RuntimeError) unknown challenge

The last thing that might be the culprit, if you are running your server directly on the machine, might be the ip config. I completely forgot about this caveat as I’ve never run a phoenix server without a reverse proxy in prod.

Try to update your endpoint configuration to the following (make sure you are doing this for your used environment, by default prod for releases):

config :my_app, MyApp.Endpoint,
  http: [
    ip: {0, 0, 0, 0},
    ...
  ]

More information about this here: Plug.Cowboy — Plug.Cowboy v2.7.2

I added ip: {0, 0, 0, 0} for http in my runtime.ex (only prod) but I get the same logs when starting the app.

it would be nice to have more logs of the http request being made by site_encrypt, I’m starting to think about forking it to add some more logs…

If your server is still unreachable from internet, then that is the problem you need to solve. Ensure that you can actually curl your endpoints using the dns/public ip.

curling an endpoint with public ip hangs before failing without any server app logs
whereas curling the instance ip fails instantly with server app logs :

Jan 09 18:06:02 VM-33a59199-01a2-43bd-87ed-6cb479b5e59a ubiquitous_funicular_web[361751]: [notice] TLS :server: In state :start at tls_server_connection_1_3.erl:894 generated SERVER ALERT: Fatal - Insufficient Security
Jan 09 18:06:02 VM-33a59199-01a2-43bd-87ed-6cb479b5e59a ubiquitous_funicular_web[361751]:  - :no_suitable_cipher

I would recommend to double-check that the configuration is actually in place by running the following command from the remote console:

Application.get_env(:your_app, YourApp.Endpoint)

It should return the above mentioned config, namely the option nested somewhere there: ip: {0, 0, 0, 0}.

If that is the case, then you have some networking issues, check how your instance is exposed to internet from your cloud console, maybe you have some kind of sandbox flag/cloud provider firewall turned on?

Endpoint file in the release seems fine :

iex(ubiquitous_funicular_web@VM-33a59199-01a2-43bd-87ed-6cb479b5e59a)1> Application.get_env(:ubiquitous_funicular_web, UbiquitousFunicularWeb.Endpoint)
[
  pubsub_server: :ubiquitous_funicular_pubsub,
  cache_static_manifest: "priv/static/cache_manifest.json",
  render_errors: [
    formats: [
      html: UbiquitousFunicularWeb.ErrorHTML,
      json: UbiquitousFunicularWeb.ErrorJSON
    ],
    layout: false
  ],
  cipher_suite: :strong,
  http: [port: 80, ip: {0, 0, 0, 0}],
  https: [
    port: 443,
    otp_app: :ubiquitous_funicular_web,
    transport_options: [socket_opts: [:inet6]],
    cipher_suite: :strong,
    versions: [:"tlsv1.3", :"tlsv1.2"],
    keyfile: "/var/lib/site_encrypt/db/certs/ubiquitous-funicular.ch/privkey.pem",
    certfile: "/var/lib/site_encrypt/db/certs/ubiquitous-funicular.ch/cert.pem",
    cacertfile: "/var/lib/site_encrypt/db/certs/ubiquitous-funicular.ch/chain.pem"
  ],
  server: true,
  secret_key_base: "hidden…",
  live_view: [signing_salt: "hidden…"],
  url: [host: "ubiquitous-funicular.ch", port: 443],
  force_ssl: [hsts: true, rewrite_on: [:x_forwarded_proto]]
]

Welp, I’m out of ideas. This config line looks suspicious though:

You don’t want to actually force https, as letsencrypt will try to reach your endpoint via http.

1 Like

I agree with you so I have removed entirely force_ssl and Plug.SSL from my project.

Unfortunately, I may have broken some conf because I’m getting these logs now :

[info] Running UbiquitousFunicularWeb.Endpoint with cowboy 2.12.0 at 0.0.0.0:80 (http)
[error] Error starting the child :site: {:shutdown, {:failed_to_start_child, {:ranch_listener_sup, UbiquitousFunicularWeb.Endpoint.HTTPS}, {:shutdown, {:failed_to_start_child, :ranch_acceptors_sup, :badarg}}}}
{exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,enqueue_or_start,6,[{file,"application.erl"},{line,380}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,359}]},{elixir,start_cli,0,[{file,"src/elixir.erl"},{line,195}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}
Runtime terminating during boot (terminating)
[notice] Application ubiquitous_funicular_web exited: UbiquitousFunicularWeb.Application.start(:normal, []) returned an error: shutdown: failed to start child: SiteEncrypt.Phoenix.Endpoint
** (EXIT) :start_error
Crash dump is being written to: erl_crash.dump...
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
uf.service: Main process exited, code=exited, status=1/FAILURE
uf.service: Failed with result 'exit-code'.

I understand that https configuration may be broken but I cannot find the issue in my updated endpoint conf :

prod.exs:

config :ubiquitous_funicular_web, UbiquitousFunicularWeb.Endpoint,
  url: [scheme: "https", host: "ubiquitous-funicular.ch", port: 443],
  cache_static_manifest: "priv/static/cache_manifest.json",
  render_errors: [
    formats: [
      html: UbiquitousFunicularWeb.ErrorHTML,
      json: UbiquitousFunicularWeb.ErrorJSON
    ],
    layout: false
  ]

runtime.exs

config :ubiquitous_funicular_web, UbiquitousFunicularWeb.Endpoint,
    http: [ip: {0, 0, 0, 0}, port: 80],
    https: [
      ip: {0, 0, 0, 0},
      port: 443,
      otp_app: :ubiquitous_funicular_web,
      transport_options: [socket_opts: [:inet6, :inet]],
      cipher_suite: :strong,
      keyfile: System.get_env("SSL_KEY_PATH"),
      certfile: System.get_env("SSL_CERT_PATH"),
      cacertfile: System.get_env("SSL_CACERT_PATH")
    ],
    server: true,
    secret_key_base: secret_key_base,
    live_view: [signing_salt: signing_salt],
    url: [host: "ubiquitous-funicular.ch", port: 443]

I tried setup certbot instead but came across the same issue :

Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
  Domain: ubiquitous-funicular.ch
  Type:   connection
  Detail: 194.182.188.115: Fetching http://ubiquitous-funicular.ch/.well-known/acme-challenge/tvj5NHp46CoEl0DlzeSjLlB6Q0EehKghvpO9K0M6tU4: Timeout during connect (likely firewall problem)

  Domain: www.ubiquitous-funicular.ch
  Type:   connection
  Detail: 194.182.188.115: Fetching http://www.ubiquitous-funicular.ch/.well-known/acme-challenge/VrZFcRuBRdPza3qIotvWk2LNIyb2npuuQ2miwhQUXDQ: Timeout during connect (likely firewall problem)

Hint: The Certificate Authority failed to verify the temporary nginx configuration changes made by Certbot. Ensure the listed domains point to this nginx server and that it is accessible from the internet.

I feel something is wrong with my public IP :weary:
Requesting anything on this public IP does not generate any logs on my server app

Sadly you have to figure that yourself, as we don’t have access to the infrastructure you use for deploy.

For easier debug, I would recommend to not use your application but instead create a hello world app(or start a hello world server in python or whatever language it’s easier these days) that serves traffic through http.

Disable firewall and all other kind of third-party interference and make sure that you can call your API from internet.

:wave:

In case you want some extra examples: I integrated site_encrypt into a project recently: Auto HTTPS in CE by ruslandoga · Pull Request #4491 · plausible/analytics · GitHub – and I used two custom plugs for ACME challenges and forcing TLS since I had trouble with making the default ones work. And I used client: :native as it provided helpful error messages. I also found GitHub - letsencrypt/pebble: A miniature version of Boulder, Pebble is a small RFC 8555 ACME test server not suited for a production certificate authority. very useful during local development.

2 Likes