Keep getting {:error, :closed} when concurrently connects a simple echo server

I implemented a simple echo server in Elixir, repo is here GitHub - cj1128/echo_server.

When I test it using echo hello | nc localhost 4100, everything is fine.

But when I use a custom client which concurrently starts 10 connections, I got below error, so confused.

13:25:12.299 [error] Task #PID<0.108.0> started from #PID<0.95.0> terminating
** (MatchError) no match of right hand side value: {:error, :closed}
    lib/client.exs:14: Client.run/0
    (elixir 1.14.3) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
    (elixir 1.14.3) lib/task/supervised.ex:34: Task.Supervised.reply/4
    (stdlib 4.2) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Function: #Function<1.124100197/0 in Client.run>
    Args: []

13:25:12.299 [error] Task #PID<0.107.0> started from #PID<0.95.0> terminating
** (MatchError) no match of right hand side value: {:error, :closed}
    lib/client.exs:14: Client.run/0
    (elixir 1.14.3) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
    (elixir 1.14.3) lib/task/supervised.ex:34: Task.Supervised.reply/4
    (stdlib 4.2) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Function: #Function<1.124100197/0 in Client.run>
    Args: []

I tried

  • sysctl kern.ipc.somaxconn=1000 increase TCP queue size
  • Add an accept pool, so multiple processes are accepting connections

none of them worked, still got those errors.

Env

OS: ARM64 and X64 MacOS 12.3

$ elixir --version
Erlang/OTP 25 [erts-13.1.3] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit] [dtrace]

Elixir 1.14.3 (compiled with Erlang/OTP 25)

How to run

$ mix run --no-halt
# in another shell
$ elixir lib/client.exs
Summary

This text will be hidden

EchoServer calls accept in one process, then recv in a different, newly-spawned one.

The process that calls accept “owns” the connection (see also recent discussion here).

1 Like

Hi, thanks for you reply.

My code is not the same as the linked one, the main process which owns the connection is always alive, so it should not cause the socket to be closed.

I modified the code to use :gen_tcp.controlling_process and I still got the error.

defmodule EchoServer do
  require Logger

  def start do
    {:ok, socket} = :gen_tcp.listen(4100, [:binary, active: false, packet: :line])
    Logger.info("Starting echo server")
    accept(socket)
  end

  defp accept(socket) do
    {:ok, conn} = :gen_tcp.accept(socket)
    pid = spawn(fn -> recv(conn) end)
    :ok = :gen_tcp.controlling_process(conn, pid)
    send(pid, :start)
    accept(socket)
  end

  defp recv(conn) do
    receive do
      :start ->
        echo(conn)
    end
  end

  defp echo(conn) do
    case :gen_tcp.recv(conn, 0) do
      {:ok, data} ->
        :gen_tcp.send(conn, data)
        echo(conn)

      {:error, :closed} ->
        :ok
    end
  end
end

When I ran the client, sometimes it’s all fine, sometimes I got {:error, :closed}, and sometimes I got {:error, :econnreset}.

I wrote a simple echo server in JS and I tested it using the same client elixir lib/client.exs and I found that no error showed, so I am sure that something is wrong with my server code, but I am really confused what is going wrong.

const net = require("net")

const server = net.createServer((socket) => {
  console.log("Client connected")

  socket.on("data", (data) => {
    console.log(`Received data: ${data}`)
    socket.write(data)
  })
})

server.listen(4100, () => {
  console.log("Server listening on port 4100")
})

This snippet works for me using telnet.
Could you reduce your code a bit more?

Hi, telnet is fine, but the server returned error when it was concurrently requested.

This is the reduced code.

defmodule EchoServer do
  require Logger

  def start do
    {:ok, socket} = :gen_tcp.listen(4100, [:binary, active: false, packet: :line])
    Logger.info("Starting echo server")
    accept(socket)
  end

  defp accept(socket) do
    {:ok, conn} = :gen_tcp.accept(socket)
    pid = spawn(fn -> recv(conn) end)
    :ok = :gen_tcp.controlling_process(conn, pid)
    send(pid, :start)
    accept(socket)
  end

  defp recv(conn) do
    receive do
      :start ->
        echo(conn)
    end
  end

  defp echo(conn) do
    case :gen_tcp.recv(conn, 0) do
      {:ok, data} ->
        :gen_tcp.send(conn, data)
        echo(conn)

      {:error, :closed} ->
        :ok
    end
  end
end

EchoServer.start()
defmodule Client do
  require Logger

  def start() do
    for _ <- 1..10 do
      Task.async(&run/0)
    end
    |> Task.await_many()
  end

  def run() do
    {:ok, socket} = :gen_tcp.connect(~c"localhost", 4100, [:binary, packet: :line, active: false])
    :ok = :gen_tcp.send(socket, "hello\n")

    case :gen_tcp.recv(socket, 0) do
      {:ok, "hello\n"} ->
        Logger.info("ok")

      other ->
        Logger.error("error: #{inspect(other)}")
    end
  end
end

Client.start()

This example also works fine for me :smiley:
I ran it in two separate iex sessions.

This is very weird, can you try to run the client multiple times? it’s not always erroring out.

If I ran the client against the node version server, then i will always succeed.

Running multiple clients also works for me.
Maybe you can try to isolate your error even further. No Logger, no modules, etc.
As long as the behavior is inconsistent I don’t think we can help you that well.

Finally found the reason.

The default backlog param of :gen_tcp is 5 and this is too small. When the TCP accept queue is full, we will get error like “error: closed”