Essentially I see multiple attempted requests to DynamoDB which hit a 20 second timeout and error out. During this period there is a spike in response time for Dynamo requests and the ssl_gen_statem processes jump to the top of the observer. Other calls made to DynamoDB also increase, for instance a put statement which is usually 200ms goes to 30 seconds.
Eventually the node is disconnected from the cluster.
Also DynamoDB is my only upstream service and the process state for the ssl_gen_statem process shows DynamoDB.
It’s always possible it could be something else but my app is very minimal, only running Phoenix channels to pass messages to clients and then using DynamoDB to save and retrieve messages.
Edit: There’s also a spike in elrang process memory usage which with ssl_gen_statem memory would make me think that it’s a likely cause. My schedulers utilization remains the same