I’m working on a proof of concept application to introduce Elixir and Phoenix at my company, and I’m running into a problem with deploying it to our AWS environment.
I’m building the application into a Docker image that deploys to AWS Fargate. There are two instances running on Fargate, with an Application Load Balancer and Web Application Firewall (WAF) in front. The application is configured to use HTTPS all the way through. The image builds and runs as expected on my local machine.
When I deploy to AWS, a static page with no websockets works as expected. When I visit a page that does have websockets, the page cycles through the same series of events in an infinite loop.
- initial page loads with a 200
- css/js/images/fonts load with 200’s
- client calls the websocket endpoint with the
phx_join
message - server responds to the websocket call with
phx_reply
, and the response is{response: {reason: "stale"}, status: "error"}
- page reloads and cycle begins again…
This is my current endpoint config in runtime.exs:
config :application_name, ApplicationNameWeb.Endpoint,
server: true,
url: [host: host, port: port, scheme: scheme],
http: [
# Enable IPv6 and bind on all interfaces.
# Set it to {0, 0, 0, 0, 0, 0, 0, 1} for local network only access.
# See the documentation on https://hexdocs.pm/plug_cowboy/Plug.Cowboy.html
# for details about using IPv6 vs IPv4 and loopback vs public addresses.
ip: {0, 0, 0, 0, 0, 0, 0, 0}
],
secret_key_base: secret_key_base,
check_origin: [
"//asi-app-name-dev-alb-719194575.us-east-1.elb.amazonaws.com",
"//fargate.asi-dev.cld.company.com/context-path/",
"//fargate.asi-dev.cld.company.com/"
]
And in prod.exs:
config :application_name, ApplicationNameWeb.Endpoint,
cache_static_manifest: "priv/static/cache_manifest.json",
https: [
port: 443,
otp_app: :application_name,
cipher_suite: :strong,
keyfile: "priv/ssl/private/selfsigned.key",
certfile: "priv/ssl/certs/selfsigned.crt",
# Allow self-signed certificates
verify_fun: {&CertUtil.verify_fun_selfsigned_cert/3, []}
],
static_url: [path: "/context-path"],
force_ssl: [hsts: true, host: nil]
Things I’ve tried so far:
- Dropping from one instance to two - no change
- Checking WAF logs - doesn’t look like any requests are getting caught there
- Talked to our DevOps team about using Network Load Balancer instead of Application Load Balance - both support web sockets
- Tweaked the values in
check_origins
and double and triple checked them against the ALB URL and deployed URL - everything seems right - Adding a function to enable self-signed certificates since we use one in the Docker image itself, following the instructions here - I think this resolved an earlier error with the handshake that was being logged to the server, although I was trying so many things that night that I’m not sure anymore. The error message was “TLS :server: In state :hello at tls_record.erl:558 generated SERVER ALERT: Fatal - Unexpected Message”, and it hasn’t shown up in the logs again in the last week.
The other strange thing in the logs is that I’m seeing this message over and over again:
TLS :server: In state :certify received CLIENT ALERT: Fatal - Unknown CA
I’m seeing it spamming the logs even after I’ve navigated away from the page, which makes me think it’s not related to this issue, but tossing it out there just in case.
I feel like I must be missing something small, but I’m not sure where to look next. I’ve really enjoyed studying Elixir/Phoenix and appreciated this forum as I’m learning. Any ideas would be welcome, as this deployment is a critical step for bringing Elixir into my company, and I will be so excited if that happens.
Thanks