We have a Elixir Phoenix application currently deployed in a highly available configuration on AWS. We use a application load balancer (layer 7) backed by 2+ individual EC2 instances (that are all members of an auto-scaling group). Each instance is running a version of our code base that was build via mix release
. All EC2 instances talk to the same postgres database.
We use AWS Code Deploy to update each instance one at a time (not blue/green) with the new release. Code Deploy handles the logistics like pausing new traffic to an instance, taking it out of service, cleaning up and redeploying the new release, testing the app after startup and putting the EC2 instance back into service on the ALB.
We do NOT use sticky sessions and allow for enough time between ALB → node traffic blocking and actual app shutdown on each node so that any API requests are completed. This way, nodes/EC2 instances can come and go from the target group with minimal interruption to the application.
With that, right now, each EC2 instance / node, does NOT know about the others. There is some concern about cachex as well as things like the live dashboard not recognizing all nodes.
We are looking into using libcluster with the EC2 tagging cluster strategy (GitHub - kyleaa/libcluster_ec2) to resolve those issues as well as form a true elixir cluster.
My question is are there any additional deployment considerations with libcluster for this setup? Like, does a node somehow need to signal to the rest of the cluster its going away so all its tasks are delt with gracefully? Or is the act of stopping the app itsself (we use unit files to call the release stop command so .../bin/MyApp stop
) enough?