Distributed Elixir not connecting between AWS ECS Fargate tasks

Setup:

  • ECS tasks with Cloud Map service discovery
  • Tasks accessible via service-name.namespace-name.local
  • dns_cluster is setup to query for tasks under service-name.namespace-name.local
  • Security group allows port 4369 (EPMD) internally
  • Nodes (running on different ECS tasks) can’t connect to each other

What am I missing? Are there additional ports or configuration steps needed for distributed Elixir on AWS ECS?

:waving_hand:

  • Security group allows port 4369 (EPMD) internally

Application ports would also need to be allowed. By default they are more or less randomly assigned. The ports can be configured with inet_dist_listen_min and inet_dist_listen_max kernel options like this. And don’t forget about cookies – they need to be the same for the nodes that attempt to connect; by default they are randomly generated on mix release so nodes with different container images won’t be able to connect. The cookie can be set with RELEASE_COOKIE env var.

4 Likes

@ruslandoga you were spot on. Those kernel ports were part of a different CloudFormation stack and thus were not applied properly. Clustering worked immediately as soon as the security group was updated to allow my port range which I set between 9000-9100.

Oh and for anyone else who might read this in the future, I also updated the vm.args.eex file to start Beam with the same kernal port range:

-kernel inet_dist_listen_min 9000 inet_dist_listen_max 9100
2 Likes

Hello @armanm

I’m also in the process of setting up an elixir cluster on ECS fargate.
Not related , but I would be curious to know

  1. how many nodes do you have in your cluster ?
  2. Is that a default Erlang cluster or did you go for something more complex like Partisan ?
  3. how many concurrent connections do you expect in your app?

if you wish you can reply to this thread

Thanks