I have an Elixir application that is ingesting messages from a queue (AWS SQS). It is based off this design that uses GenStage.
The design is as follows:
GenStage Producer that fetches messages from the queue
GenStage Consumer that subscribes to Producer, receives the queue message and deletes it from the queue
The application runs smoothly with the exception of a failure that occurs 1-2 times every hour.
It seems that the GenStage Consumer is being sent the following message (which is not accounted for and hence fails): {:sslsocket, {:gen_tcp, #Port<0.XXXX>, :tls_connection, :undefined}, #PID<0.XXXX.0>}}
Iām racking my brain trying to understand how this might be occurring. There are no explicit messages being sent to the consumer in my application code. The consumers are relatively simple as well - theyāre simply using ExAws to make requests to AWS through their API. I donāt make any calls to :gen_tcp at any point.
Iām curious if anybody has encountered something similar or had any ideas about what might be going on.
It may be a library that you invoke from the consumer that is sending the consumer processes unwanted messages. For example, maybe you are calling ExAWS.foo(...) and that is storing the consumer process which eventually leaks a message. Not saying it is ExAWS though, just an example.
One idea is to match on the message and use Port.info on the port and Process.info on the pid and then log the results so you can read it later on. That should give you more hints about which process is leaking them.
Update on the investigation - it looks like calling Port.info and Process.info on the port and pid result in nil. It looks like the processes have died by the time the message is handledā¦
I believe the reason was the one mentioned above. Something is doing a request using SSL, then the connection is closed, and that leaks the ssl_closed message. The best option is to track whatever is leaking the message and fix the leakage but the leakage in itself shouldnāt be harmful.
We noticed this behavior appearing after upgrading to OTP 21.2 (also using ExAWS with SQS).
Upgrading to 21.2.3+ reduced the occurances but didnāt fix it completely yet.
Getting the same issue using ExAWS and SQS. We just migrated our app from Heroku (Elixir 1.7.4 OTP 20.3) to AWS, upgrading at the same time the OTP version to 21.3 keeping the same Elixir version. We never saw this issue on Heroku.
Iāve added a catch-all handle_info to my GenServer which calls HTTPoison functions, as suggested in the previous message, but Iām still getting the error. Did you put this in your GenServer or somewhere else?
Iām not sure if I did it properly though because I donāt really understand whatās going on. What is a āmessage leakā?
@amarandon could you make sure you added handle_info callback where you call those libraries (and thus start a process from it)?
Process A calls HTTP lib func (e.g. HTTPoison or Tesla) synchronously
HTTP lib func behind the scene starts a process B to handle HTTP connection
When HTTP request is completedā¦ HTTP lib func returns the result (in process A)
The problem is, for some reason process B (which should be hidden from process A point of view) sends those messages to process A - so itās āleakingā.