Hi folks,
I have currently setup a POC of running an elixir application on Azure Container Apps (ACA) and it works. We currently need to run on Azure and ACA seems to provide the right set of abstractions (runs containers but abstracts away Kubernetes) to run applications with the least amount of operational overhead.
Thinking beyond POC & about Day 2 operations, I am wondering
Is anyone else using ACA + Elixir in Production ?
Any gotchas to be aware of ?
Are you running a clustered setup on ACA or is it feasible to do so ?
We’ve been running an Elixir system (phoenix liveview app) in production on Azure for a few years, but not using Azure Container services. We started with terraform but moved to bicep later for our infrastructure as code. Happy to sync up and share experience if it is helpful. PM me.
Regarding the clustered setup, you might want to consider dropping the Azure Container Apps environment into a dedicated VNET (Integrate a virtual network with an Azure Container Apps environment | Microsoft Learn), so that the distributed Erlang traffic is private. But I haven’t looked into that scenario yet, like how to do node discovery etc. One could be giving the nodes a managed identity that can read the environment via ARM.
Thanks, I am using a dedicated VNET & creating a subnet thats for the ACA Environment.
But I haven’t looked into that scenario yet, like how to do node discovery etc. One could be giving the nodes a managed identity that can read the environment via ARM.
Ah, this is what I need to explore further. There doesn’t appear to be DNS setup to query IPs of the instances (maybe by design ) from an instance.
Is what would be queryable from the environment via ARM documented somewhere ?
If you plan to run the BEAM using Erlang distribution, I don’t think Azure Container Apps is the ideal environment for that. If all BEAM instances are standalone (behind the load balancer / ingress), then all is good. But Erlang distribution is east-west-traffic, cluster-internal gossip. For that to work, you need 2 things:
You want to be able to discover the IP addresses of other BEAM instances/nodes/pods.
You must be able to actually get the TCP traffic going across these nodes.
Discovery: I’m not aware of an API that a freshly booted node could query, to enumerate the environment. It seems not to be exposed in ARM, also not in an environment variable. As a workaround, a fresh node could write it’s own IP to an external configuration store (Azure Storage, relational DB, etcd/consul/zookeeper), and use that store to discover other nodes. But that complicates things, because now you also depend on that external component.
East/West traffic: Assuming you found all the VNET-internal IP addresses of the other nodes, you need to establish the TCP connection when your instance connects to another node. Azure Container Apps has effectively an AKS cluster under the hood, and it has a certain CNI implementation going. Assuming you can successfully get your app to fly, but in 6 months down the road the Container apps team changes the CNI implementation and suddenly you can no longer directly talk across nodes, that would be an outage outside of your sphere of influence.
I guess that’s a long-winded explanation to say: If you want Erlang distribution, certainly running on Azure Kubernetes Service (AKS) or on IaaS virtual machines might be the most predictable option.
It might be that there’s some other way, but I’m not aware of.
From my investigation (env variables passed to the instance in ACA) there isnt a way of discovering the IPs of all instances.
After doing some manual setup (grab hostname of each instance & populate each instance’s /etc/host), I was able to get 2 instances to talk to each other (East - West) using their private IPs. But, you’re 100% right that this might work today but there is no guarantee of it continuing to work.
Here’s a more formal statement: “Currently, we would not recommend this at all. East/West communication (between replicas) really is not a scenario we cover at all and any effort to make it work from the customer would be very brittle and could break at any point.”