Hi @cjbottaro.
Ecto 2.0 has completely different pooling than Ecto 1.0. If you want to write a custom pool I would suggest going with Ecto 2.0 and trying to build a DBConnection.Pool
. Writing a pool is not easy and doing transactions with a pool is even harder. DBConnection
separates pooling from many of the aspects of handling database connections, including transactions, and has a strict integration test suite that you can run to test your own pool.
The main issue with the Ecto 1.0 pool approach was that state was held in many places. With the Ecto.Pools.Poolboy
pool, :poolboy
holds state about clients processes (process that check out connections) and worker processes (process that supervises a connection process). The client holds state about the worker process and the connection process. The worker process holds state about a client process and a connection process. If something goes wrong in a transaction then :poolboy
, the client process and the worker process all hold state about the transaction. Unfortunately the only possible authority on the actual transaction state, the connection process, did not hold any about that. It was possible for all three to have different views on the state depending on the timing of messages. The code is a bit of a nightmare, so hopefully I’ve put you off.
Switching to DBConnection
only one process has a view of the connection state at a time, and only the client process holds transaction state and this is completely independent of pooling. The way this works is by checking out the socket and state, rather than a worker pid, from the pool, and directly using the socket and state in the client process. It also means a single connection can act as a pool with 1 socket. In effect it creates the socket and state, and has a queue of client process waiting to use the socket, effectively being a queue for a lock on a socket. This process is DBConnection
and the base for the other pools.
In order to provide multiple connections we need to have multiple DBConnection.Connection
processes. The DBConnection.Poolboy
pool works by pooling DBConnection.Connection
processes. This means a client process first checks out a DBConnection.Connection
from :poolboy
and then checkouts out the socket from that DBConnection.Connection
. So we are running a pool inside a pool. I would suggest following this approach because then it is just a case of pooling processes and there is no need to worry about handling connections and the DBConnection
callbacks, DBConnection.Connection
will handle that. Here is the code for the poolboy pool: https://github.com/fishcakez/db_connection/blob/ada921407e3133f099bc88b6940ab5e61b15ac70/lib/db_connection/poolboy.ex https://github.com/fishcakez/db_connection/blob/ada921407e3133f099bc88b6940ab5e61b15ac70/lib/db_connection/poolboy/worker.ex.
The other two pools (DBConnection.Sojourn
and DBConnection.Ownership
) work slightly differently. These have a proxy process that checks out the socket from DBConnection.Connection
and then passes that socket to the client process. However in both cases the proxy process doesn’t call any of the DBConnection
callbacks and just passes the socket around.
Postgrex has a type cache per host/port/database/extension options combination. Ecto 2.0 has a query cache per Repo. An Ecto Repo’s query cache stores Postgrex query structs that depend on a Postgrex type cache. If one of these query structs is used on a Postgrex socket that doesn’t use the same Postgrex type cache the query will fail. This means to use the Ecto 2.0 query cache every connection in an Ecto Repo must use the same host/port/database/extension options. In Ecto 2.0 all the cached queries are named, inserted when first used and garbage collected when the Repo is stopped.
Postgrex sockets store the query structs that have been prepared on their backend so only prepares named queries when they need to be prepared. This means in Ecto 2.0 we can skip generating the SQL in Elixir and skip parsing the SQL in the Postgresql backend and can cache any query even if we don’t have a schema. When using pgbouncer the query cache is still used but all the names are ignored (use option prepare: :unnamed
) i.e. every query must be prepared every time. However the type information for the cached query can still be reused and its the same number of round trips to the database. Reasonably standard ORM approach.
The Postgrex type cache is created when the first connection connects to postgresql on a host/port/database/extension combination. A query is done to fetch all the types on the server and this is compared to types Postgrex know hows to encode/decode. Then queries will only succeed if Postgrex knows how to encode/decode all parameters/fields. This cache is garbage collected when there are no connection processes for a combination during a period of time. This means if a database isn’t used for 1 minute the next connection will need to connect, authorise and bootstrap before it can serve a request, other connections to the database are blocked waiting for the bootstrap to complete.
Therefore you will want to have 1 Repo per database (tenant?) and at least 1 connection process per Repo per Elixir node. Both DBConnection.Poolboy
and DBConnection.Sojourn
can provide this with pool_size: 1, pool_overflow: n
, where max would be pool_size + pool_overflow
. The resize strategies for both pools are quite different though.
DBConnection.Poolboy
will start a connection whenever it has overflow available and a client process requests a pooled DBConnection.Connection
but none are waiting/idle. It close overflow connections when overflow is in use and a pooled process is waiting idle. This means process are started quickly and stopped rapidly. This can cause churn as connections get open only to be closed then immediately reopened.
DBConnection.Sojourn
samples the queue speed at random intervals and decides whether to start a new connection based on these samples. The default algorithm requires pooled processes to be waiting less than 5ms for 100ms before it will start a connection, then interval decreases for each consecutive period where the pooled processes are waiting less than the target of 5ms, i.e. it goes approximately 100, 100, 70, 60, 50, 45, 40, 35. This tries to ensure that connections are started only when needed and throttles the maximum number of connections that can be started in a period of time. The pool shrinks when the oldest idle connection has been idle for more than 1 second, then waits 100ms, until it drops the next, then 100, 70, 60, 50 etc while the oldest idle connection has been idle for more than 1 second. This ensures that idle connections are dropped soon but not too quickly that they would be reopened immediately and not all at once if the pool gets a burst and then goes idle. This queue is LIFO so the most recently used connection is used first, so the least recently used is likely to stay idle. LIFO connection queues also work better with pgbouncer because it favours the most recently used connections too.
Alternative growing and shrinking strategies can be used with this pool. For example it is possible to always start a connection when the sampled queue speed is less than 5ms, rather than requiring below 5ms target for 100ms interval, using a different regulator callback module to the default. It is even possible to write your own regulator that decides what to do based on the sampled values.
DBConnection.Sojourn
will also set off an alarm on the alarm_handler
GenEvent
if connections can’t connect to the database, and clear it when they can reconnect. An :ets
based “protector” is on by default that will short circuit Repo requests when the database is down, prevent the pool process getting overloaded with more requests than it can handle, and actively drops requests to keep Repo queries feeling responsive to end users without dropping too many.