Why gen_tcp socket closes in binary mode, but keeps open in list mode?

I wrote a GenServer that wraps a gen_tcp socket and does some long running tasks involving reading and writing on the socket.

defmodule MyClient do
  use GenServer

  @impl true
  def init(_) do
    {:ok, nil, {:continue, :connect}}

  @impl true
  def handle_continue(:connect, _) do
    {:ok, socket} = :gen_tcp.connect(@ip, @port, [
      mode: :binary,  # <-- NOTICE THIS CONFIG
      packet: 0,
      keepalive: true,
      active: false,
      reuseaddr: true,
      send_timeout: 5000,
      send_timeout_close: true
    send(self(), :work)
    {:noreply, socket}

  @impl true
  def handle_info(:work, socket) do
    {:ok, data} = :gen_tcp.recv(socket, 4)  # <-- The problematic line
    {:noreply, socket}

When I set the mode to :list, it works fine.

When I set the mode to :binary, the line :gen_tcp.recv always returns {:error, :closed}. I wonder why. Did I messed up the configuration?


“Argument Length is only meaningful when the socket is in raw mode and denotes the number of bytes to read. If Length is 0, all available bytes are returned. If Length > 0…”

your socket isn’t in raw mode! You may want to consider just passing length 0, otherwise whatever is incoming is going to cause a problem.

I’ve tried packet: :raw, but still no luck.

are you trying to read out 4 bytes that is going to tell you how many more bytes are coming in the packet? There’s a special mode for that. let me see if I can find that. (I think you generally don’t want to be using raw).

Yes and no. The actual packet size is in the 3rd and 4th byte, in little endian (I just have no right to change the protocol :cry:)

By the way, the first 2 bytes is a fixed magic number <<0x68, 0x16>>.

I would say pass 0 and use pattern matching to trim the binary down to the size you care about. I don’t know what the BEAM does about malicious packets that are too long…

Sadly, there’s no EOF. The sizes of the packets vary. All multi-byte numbers are in little endian. And there’s a damn footer that contains a checksum!! I would definitely trash this stupid protocol and implement a standard one, only if I could. At the moment, my only hope is the raw mode.

that’s okay, the beam will reconstitute the packets with appropriate boundaries as sent over TCP, so unless the protocol is doing something really wild and smushing packets together in strange ways, your recv result in should be matchable to {:ok, <<0x66, 0x16, length::16, result::binary-size(length)>>} (with total length length + 4)

oh you might need to do an endianness check on the length match.

That’s not an option, either. The socket should be always open (unless something unexpected happens), and over time, there’ll be indefinitely many reads and writes on it.

Yeah, I did it. I just didn’t write it in this question.

I think you can do this:

<<0x66, 0x16, length::size(16)-unsigned-integer-big, result::binary-size(length), checksum::binary-size(@checksum_length)>>

the socket won’t close when you recv off of it!

It sounds like for your use case you may also want to try active sockets, which instead of recv() polling converts those messages to {:tcp, src, binary} messages sent directly to your GenServer, though keep in mind in extreme circumstances your GenServer might die and you’ll drop everything that was queued up on the floor with more-difficult-to-implement backpressure notification.

Thanks, I’ll try it immediately.


Doesn’t work. The length takes the 1-byte checksum into account, so I tried

{:ok, <<0x68, 0x16, length::little-16, result::binary-size(length - 1), checksum>>} = :gen_tcp.recv(socket, 0)

It does not compile.

I don’t think you can do math inside of the binary matches like that =( unfortunately. Otherwise erlang would be crazy crazy crazy awesome! Maybe this: <<0x68, 0x16, length::little-16, result_and_checksum::binary-size(length)>>

don’t forget to pattern match on the ok tuple as well.

Thank you, it worked. However, if 2 packets come at the same time, will :gen_tcp.recv(socket, 0) treat them as one packet and thus fails the pattern matching?

Ok so in my experience that never happens with gen_tcp. TCP packets come with integrity and checking information and the BEAM should respect that and chunk things correctly with recv. However I have seen that happen with tls >= 1.2. Drove me mad. If you are ever expecting to upgrade to tls or are super paranoid, beacuse I don’t 100% know what the beam guarantees are, I recommend not matching at the receive point, just grab all of the binary data and send it to a helper function that can recursively go over the binary (something like this):

def helper(rest), do: :noop
def helper(<<0x68, 0x16, length::.., result::binary-size(length)>> <> rest) do
  dispatch(result, length)
def helper(_malformed_binary), do: raise "naughty bytes!"
1 Like

haha I feel like after this you’ll never want to handle network packets in any language outside the beam =D.

Exactly. The only thing I wish is that the protocol could make more “common sense” (big-endian, no magic number, moving the checksum to the header, stateless, using unix epoch to represent timestamp… Hell, this protocol even has millennium bug!), so that case I can use the active mode to handle packets.