Ranch connection silently becomming inactive?

I’ve got an app that connects to a number of tcp ports using :ranch_tcp.connect/3

I connect in passive mode and then switch to active once I get {:ok, socket}.
When I get the socket I execute a Telemetry message:

 case :ranch_tcp.connect(...) do
      {:ok, socket} ->
            [:some, :key],
            %{active: 1}

I call :telemetry.execute in each of handle_info({:tcp_closed, _socket}, _state ) and handle_info({:tcp_error, _socket}, _state ) where I set the active key to 0.

One of the systems I connect to is a back-up system that is only used if we need a fail-over for some reason. Thus there is only very infrequently traffic from that ip/port. When such a fail-over occurred I did not get any messages form the back-up connection.
Telemetry had not executed an active: 0 message.

So my question is:
Is there a way that a tcp connection in Ranch/gen_tcp can become inactive without being caught by {tcp_error, _} or {tcp_closed, _}?

Just a wild stab in the dark, but could it be there is some stateful network element (NAT, firewall) on the TCP path, which dropped the connection for inactivity? In that case the connection may still have been in active mode in the BEAM, but the packets simply never made it through the network. I’ve been bitten by this.

If you can’t be sure the network path fully transparent, keeping idle TCP connections open may require some keep alive, either at the kernel level (see keepalive in gen_tcp options) or the application level (periodically send a newline, or something else that does not cause the other end to choke).

1 Like

Thanks, I’ve been thinking along those lines.

I’ve added keepalive: true to the connection config. Let’s hope that does the trick

For posterity:

I searched the logs and indeed found some suspicious TCP FIN messages. Hopefully someone in the future will see this and save the selves some head scratching because of silent tcp sockets.

1 Like