GenStage - Unexpected :ssl_closed message

Hi!

I have an Elixir application that is ingesting messages from a queue (AWS SQS). It is based off this design that uses GenStage.

The design is as follows:

  • GenStage Producer that fetches messages from the queue
  • GenStage Consumer that subscribes to Producer, receives the queue message and deletes it from the queue

The application runs smoothly with the exception of a failure that occurs 1-2 times every hour.
It seems that the GenStage Consumer is being sent the following message (which is not accounted for and hence fails):
{:sslsocket, {:gen_tcp, #Port<0.XXXX>, :tls_connection, :undefined}, #PID<0.XXXX.0>}}

Iā€™m racking my brain trying to understand how this might be occurring. There are no explicit messages being sent to the consumer in my application code. The consumers are relatively simple as well - theyā€™re simply using ExAws to make requests to AWS through their API. I donā€™t make any calls to :gen_tcp at any point.

Iā€™m curious if anybody has encountered something similar or had any ideas about what might be going on.

For context:

Elixir Version:
1.5

Dependencies:

  • configparser_ex
  • credo
  • dialyxir
  • distillery
  • ex_aws
  • excoveralls
  • gen_stage
  • hackney
  • httpoison
  • mock
  • poison
  • sentry
  • sshex
  • sweet_xml
  • timex
2 Likes

It may be a library that you invoke from the consumer that is sending the consumer processes unwanted messages. For example, maybe you are calling ExAWS.foo(...) and that is storing the consumer process which eventually leaks a message. Not saying it is ExAWS though, just an example.

1 Like

One idea is to match on the message and use Port.info on the port and Process.info on the pid and then log the results so you can read it later on. That should give you more hints about which process is leaking them.

2 Likes

That makes sense. Thanks for the leads!

Iā€™ll continue investigating and will update with any progress made.

2 Likes

Update on the investigation - it looks like calling Port.info and Process.info on the port and pid result in nil. It looks like the processes have died by the time the message is handledā€¦

1 Like

What is your erlang version? Erlang 19 has an issue with broken ssl. Iā€™ve run into a similar issue which was fixed when I used the proper tls version :ā€˜tlsv1.2ā€™ (https://github.com/edgurgel/httpoison#note-about-broken-ssl-in-erlang-19)

2 Likes

Unfortunately Iā€™m on Erlang 20, I donā€™t know if that issue appliesā€¦

1 Like

Did you find any solution to this issue? Iā€™ve just encountered the same issue in a Genserver that consumes SQS messages using only ExAws.request()

1 Like

Iā€™m in the same boat, as well, with SQS.

I believe the reason was the one mentioned above. Something is doing a request using SSL, then the connection is closed, and that leaks the ssl_closed message. The best option is to track whatever is leaking the message and fix the leakage but the leakage in itself shouldnā€™t be harmful.

1 Like

We noticed this behavior appearing after upgrading to OTP 21.2 (also using ExAWS with SQS).
Upgrading to 21.2.3+ reduced the occurances but didnā€™t fix it completely yet.

Some related pointers:

2 Likes

Getting the same issue using ExAWS and SQS. We just migrated our app from Heroku (Elixir 1.7.4 OTP 20.3) to AWS, upgrading at the same time the OTP version to 21.3 keeping the same Elixir version. We never saw this issue on Heroku.

21.3 also had a ssl bugā€¦ please try with latest patch releaseā€¦ eg 21.3.8.3

you didnā€™t see it on heroku as you were using OTP 20.3

1 Like

Thank you for the answer @outlog. We have upgraded OTP to 21.3.8.3 but weā€™re still seeing the issue.

@antoniobg seems like itā€™s tracked here: https://github.com/benoitc/hackney/issues/464 - and that there is a current workaround of implementing
def handle_info({:ssl_closed, _msg}, state), do: {:noreply, state}

you could also consider using a different http client say HTTPotion:

https://hexdocs.pm/ex_aws/ExAws.Request.HttpClient.html

@adamkittelson @chulkilee

Iā€™ve added a catch-all handle_info to my GenServer which calls HTTPoison functions, as suggested in the previous message, but Iā€™m still getting the error. Did you put this in your GenServer or somewhere else?

Iā€™m not sure if I did it properly though because I donā€™t really understand whatā€™s going on. What is a ā€œmessage leakā€?

@amarandon could you make sure you added handle_info callback where you call those libraries (and thus start a process from it)?

  • Process A calls HTTP lib func (e.g. HTTPoison or Tesla) synchronously
  • HTTP lib func behind the scene starts a process B to handle HTTP connection
  • When HTTP request is completedā€¦ HTTP lib func returns the result (in process A)

The problem is, for some reason process B (which should be hidden from process A point of view) sends those messages to process A - so itā€™s ā€œleakingā€.

1 Like