Distributed Elixir in Amazon ECS

Did anyone successfully set up ECS using distributed Elixir? I am tackling this right now.

I am especially interested how to connect the nodes. I was thinking it might make sense to write a custom libcluster strategy that fetches the current ip’s connected to the Application Load Balancer. ECS also offers service discovery using DNS, so that might also be a solution (this creates an A record for each node) and there seems to be a DNS strategy already in libcluster.

As far as I know these are the things to be addressed:

  • Setting the node name to: <APPNAME>@<PRIVATE IP>
  • Exposing the Port Mapper Daemon port in the VPC and docker image (4369)
  • Exposing the intra-erlang communication ports (configurable using inet_dist_listen_min, inet_dist_listen_max) in the VPC and the docker image
  • Service discovery (setting up communication between nodes) using a libcluster strategy

Setting the hostname to include the private IP can be done by using curl, not sure yet how to best inject this as a env variable.

Perhaps anyone already has figured some of this out on ECS/EC2? Help really appreciated! Planning to document the results so it’s easier for other to get this up and running. I already have some experience building a very small docker image using releases and multi-stage builds.


You have hit on the major points. For the hostname, I can offer a concrete example. In the script that starts my application:

# Permit OS variable substitution for starting the VM
export REPLACE_OS_VARS=true

# Get the EC2 fully-qualified hostname for the node name
export PUBLIC_HOSTNAME=`curl -s`

Then in rel/vm.args you can set the node name using the env var:

-name <%= release_name %>@${PUBLIC_HOSTNAME}


1 Like

We have a few apps running on Fargate ECS. Basically, we build bare images with only the erlang relase build with distillery inside.
In latest setup we use a simple sh script to get internal ip

export NODE_NAME="$1"; \
  export NODE_IP; \
  NODE_IP=$(/sbin/ip route|awk '/scope/ { print $9 }'); \
  exec /opt/app/$1/bin/$1 foreground

and in vm.args we set

-name ${NODE_NAME}@${NODE_IP}`
-setcookie ${COOKIE}

We query Route 53 SRV records to discover other nodes and connect them on internal network (no need to expose EPMD ports). We use peerage lib with custom strategy for that.

1 Like

Great! That is in line with what I wanted to do.

So you don’t need to expose the ports in the VPC if it’s internal communication? (AWS noob I know)

Also do you expose the erlang communcation ports in docker (I see you don’t specify the port range in vm.args)

Btw, now ECS support DNS service discovery out of the box.


I just used https://github.com/bitwalker/libcluster/blob/master/lib/strategy/dns_poll.ex and it works well.

My vm.args:

-name <%= release_name %>@${PUBLIC_HOSTNAME}
-setcookie <%= release.profile.cookie %>
-kernel inet_dist_listen_min 9000
-kernel inet_dist_listen_max 9010

My start.sh:


export REPLACE_OS_VARS=true
export PUBLIC_HOSTNAME=`curl | jq -r ".Containers[0].Networks[0].IPv4Addresses[0]"`

echo "Hostname: $PUBLIC_HOSTNAME"

REPLACE_OS_VARS=true /app/release/bin/start_server foreground

The DNS name can be taken from service details page. Usually it is my-service-name.local. Use it as query for Cluster.Strategy.DNSPoll strategy


Yep that’s exactly what we are using as well now. BTW the erlang communication port can be a single port.

1 Like

Hey guys, thanks for all the info, it helped me a lot set it up myself.

You’re all running one node per instance, with a fixed port mapping from container to host, right?

AWS doesn’t allow A records in the service registry unless the networkMode of the container is awsvpc (where each container has its own elastic network interface and IP address), I suppose because there could be multiple instances per host, running on different ports. It has to be SRV, which is fine.

The DNSPoll strategy only polls A records by default, so I’m curious about whether you guys had to use a custom resolver like I did, or if there’s a more straightforward way?

I use Fargate, so it uses the A record for service discovery.

1 Like

Our tasks are using awsvpc network mode. It has some limits depending on EC2 instance type https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/using-eni.html . With Fargate you don’t need to think about this limitation


We use a custom Peerage.Provider implementation like

defmodule AwsServiceDiscovery do
  @behaviour Peerage.Provider

  @impl true
  def poll do
    dns_name = Application.fetch_env!(:peerage, :dns_name)
    app_name = Application.fetch_env!(:peerage, :app_name)

    :inet_res.lookup(String.to_charlist(dns_name), :in, :srv)
    |> Enum.flat_map(fn {_priority, _weight, _port, srv_dns_name} ->
      :inet_res.lookup(srv_dns_name, :in, :a)
      |> Enum.map(fn ip ->
        :"#{app_name}@#{:inet.ntoa(ip) |> to_string}"

I am using the bridge network mode, with the custom peerage provider for srv records.

 NAME                RESULT OF ATTEMPT
 api@ true      
 api@ false     

 api@ (self)

Not really sure why it won’t connect. Any help is appreciated.