Why is it important to limit the number of connections to a server process with process pooling?

ryanzidago · April 29, 2020, 7:14am

I am currently building a terminal based chat system where I use :gen_tcp.

I have a TcpServer that accepts connections, and a TcpClientPool, which is an Agent, that saves each client in the TcpClientPool own state. Basically, the TcpServer retrieves the TcpClientPool states to get all of the current connected clients, and broadcasts one messsage from one client to all other clients. The TcpClientPool accepts an unlimited number of clients.

However, in Going low level with TCP sockets and :gen_tcp, Orestis Markou mentions the need for implementing a pool of connections that accepts a limited number of clients. I remember also having read in the Elixir in Action book that implementing a limited number of connections for the to-do application was the way to go.

Why would one do that?

I have read the Elixir School’s post on Poolboy:

Let’s think of a specific example for a moment. You are tasked to build an application for saving user profile information to the database. If you’ve created a process for every user registration, you would create unbounded number of connections. At some point the number of those connections can exceed the capacity of your database server. Eventually your application can get timeouts and various exceptions.

But I still don’t understand:

I thought the BEAM could handle millions of concurrent processes, so I’m quite confused; if the BEAM can handle millions of concurrent processes, then I don’t need to implement a pool of processes, unless I plan to have millions of them running concurrently right?
Or is it simply that my TcpServer process cannot handles that many concurrent connections?
If so how to know the maximum of connections that my TcpServer can handles?

LostKobrakai · April 29, 2020, 7:22am

Networking is not just “the BEAM” though. It’s also your host OS, host hardware and the network itself. Those might enforce real constraints on your architecture.

Nicd · April 29, 2020, 9:19am

BEAM can handle a lot of concurrent processes and connections, it’s good at that. But in the example there is a database server, and that cannot. Typically for example the PostgreSQL default configuration allows 100–200 connections and no more. This can be tuned but it’s not infinite. Rather than having your processes try and fail to connect to the database, they will instead send the request to the pool and wait for the pool to handle it.

lucaong · April 29, 2020, 9:29am

As others said, a connection can be an expensive resource, and how expensive it is does not depend on the BEAM. Remember that the OS kernel mediates all interaction with hardware, including the network devices. The example of PostgreSQL is a good one, because it shows the cost both on the client side and on the server side. Whenever you establish a connection to Postgres, these things have to happen:

The OS on the client machine has to create a TCP socket, allocate a port, etc., all of which uses resources
The Postgres server will create a new OS process to handle the connection, and each process will use resources on the database server side

These resources are more expensive to create and maintain than a BEAM process, therefore it’s a good idea to use a pool.

al2o3cr · April 29, 2020, 11:24am

The point is that there’s still a limit, and it’s shared amongst all the things running on the BEAM. You use a pool to provide a documented / monitorable / etc limit instead of an implicit one.

ryanzidago · April 29, 2020, 4:50pm

Make sense.
Thanks you all for the answer. I understand it better now!