ECS - service discovery

I am running elixir on AWS ECS, using service discovery. I am using Peerage to connect to the nodes, but I am getting the follow logs:

NAME                RESULT OF ATTEMPT
my_api@172.31.27.117 false     
my_api@172.31.28.164 true      

LIVE NODES
my_api@172.31.28.164 (self)

Any idea on what could be related? Both tasks run on one instance, using the awsvpc networking mode.

A guess: could a security group deny traffic between the nodes?

I have these inboud rules. That should be enough right?

You need both IN and OUT rules to the same security group. They sit on top of the VPC. If using CloudFormation then this should be in place (port ranges determined by Erlang Distribution setup)

ApplicationSecurityGroupMutualIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref ApplicationSecurityGroup
      SourceSecurityGroupId: !Ref ApplicationSecurityGroup
      IpProtocol: tcp
      FromPort: _
      ToPort: _
  ApplicationSecurityGroupMutualEgress:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref ApplicationSecurityGroup
      DestinationSecurityGroupId: !Ref ApplicationSecurityGroup
      IpProtocol: tcp
      FromPort: _
      ToPort: _

I see you have 2 tasks on the same instance and using awsvpc network mode so this is not a cross-AZ issue as the ENIs will be in the same subnet, however if you wish to spread the workload across multiple AZs later then it is worth checking that routes are in place

2 Likes

I added the flow logs. I am getting some ACCEPT, and some REJECT.

${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}
2 xx eni-0cea8ccd7807fa468 91.213.50.137 172.31.11.34 40040 49208 6 1 40 1616962326 1616962340 REJECT OK
2 xxx eni-0cea8ccd7807fa468 91.213.50.134 172.31.11.34 59924 45771 6 1 40 1616962326 1616962340 REJECT OK
2 xxx eni-0cea8ccd7807fa468 172.31.11.34 52.95.125.134 44468 443 6 4 160 1616962326 1616962340 ACCEPT OK

What is up with these ports? 45771 and 49208?

Edit: Ok, I managed to fix this. It was indeed a security group issue. The flow logs helped. Not sure what ports I should keep open though?

Only need to focus on communication between your Tasks, the rest is not relevant to the investigation. But yes if you open a connection to port 443 then locally your traffic probably goes out via an ephemeral port. Can study Ephemeral port - Wikipedia

Erlang distribution by default uses port 4369 for EPMD and between inet_dist_listen_min and inet_dist_listen_max. Can study Erlang Distribution Without epmd in addition

2 Likes