Sending a network socket to another node

umgefahren · May 18, 2022, 6:40pm

I currently try to send a network connection to another distributed node. While doing that, the socket closes.

The socket is created via :gen_tcp.

I know this is a hard problem and also possibly incredibly hard to solve, I just figured it was a common task and expected an easy solution.

  @spec delegate_socket(socket :: :gen_tcp.socket()) :: nil
  defp delegate_socket(socket) do
    delegate_node = Pool.get_node

    IO.puts "Sending socket to delegate node #{delegate_node}"
    IO.puts "#{Node.ping delegate_node}"

    pid = Node.spawn(delegate_node, fn ->
      IO.puts "running on node"

      receive do
        socket ->
          Server.Tcp.Handler.handle(socket)
      end
    end)


    send pid, socket
    # :ok = :gen_tcp.controlling_process socket, pid
  end

The Handle Function:

defmodule Server.Tcp.Handler do
  def handle(socket) do
    IO.puts "got in loop"

    socket
    |> read_line()
    |> write_line(socket)

    handle socket
  end

  defp read_line(socket) do
    {:ok, data} = :gen_tcp.recv(socket, 0)
    data
  end

  defp write_line(data, socket) do
    :gen_tcp.send(socket, data)
  end
end

If that’s just impossible, which is very reasonable, I would now a workaround, but that could be very, very inefficient.
So how would the expert Elixir/Erlang programmer proceed here?

I can’t really imagine that someone would plug an Ngnix or Traefik in front of there Elixir to do the load balancing.

lud · May 18, 2022, 7:51pm

I am not sure how gen_tcp works under the hood but if that code works I would bet that the connection is still maintained by the orignal node where it was open. So basically all in/out messages will have to pass through that first node.

al2o3cr · May 18, 2022, 8:15pm

:gen_tcp can use a port under the hood, which you can’t send to another node.

Take a look at Cowboy for a good example of using an acceptor pool, but note that that runs on a single node. If you wanted to do the heavy work on another node, you could use a loop handler that starts the work and waits for a reply.

Why? Those are both useful tools, and unless you have a specific need they don’t address you’re not going to gain much by trying to re-implement them by hand in the BEAM.

benwilson512 · May 18, 2022, 8:47pm

If you’re running multiple nodes and you want to load balance connections to them, you’re best off using a dedicated load balancer.

l3nz · May 19, 2022, 7:02am

I don’t think that you can send a socket, because that is backed by an underlying OS socket, and that cannot be sent over. But:

if you control the protocol, you could implement a redirect, or just “hang up” from nodes that are too busy and wait for the client to reconnect
You usually receive data to do something with it. Reading a socket, doing basic framing/cleanup, and sending the result elsewhere for processing is very lightweight. So you may not really care about load balancing, because your acceptor could be so lightweight that you just don’t care. Databases, transcoding, whatever, go to dedicated compute nodes.

It really depends on what you want/need to do.