Info on implementing custom Ecto.Pool

cjbottaro · June 13, 2016, 7:49pm

Hi,

We have a multi tenant application where unfortunately each tenant has its own Postgres database (actual database, not schema). It sucks, but I can’t change that.

With over 500 tenants, I don’t think it’s feasible to make a connection pool for each one, even with pgbouncer because we have many servers. 500 tenants * 20 connections per pool * 100 servers = 1_000_000 connections. Also, the system needs to allow connecting to a newly created tenant’s database immediately.

So I’m wanting to implement a custom Ecto.Pool that acts more like an LRU cache than a normal connection pool.

I figured it would be as easy as implementing the Ecto.Pool behavior, but had some questions about the transaction related callbacks. Then I cracked open the source for Ecto.Pools.Poolboy and Ecto.Pools.Poolboy.Worker… and that’s when I realized I was in over my head…

Can anyone give any general info on implementing a custom Ecto.Pool? Looking at the Ecto.Pools.Poolboy source, I don’t understand the relationship to Ecto.Pools.Poolboy.Worker. Why does it sometimes checkout via :poolboy sometimes checkout via the Worker and sometimes both?

What’s the general idea for handling transactions?

Thanks for the help,
– C

uranther · June 13, 2016, 9:00pm

You probably already saw, but there is a thread about lazy connection pooling for Ecto which has some good details on the topic.

fishcakez · June 15, 2016, 11:50pm

Hi @cjbottaro.

Ecto 2.0 has completely different pooling than Ecto 1.0. If you want to write a custom pool I would suggest going with Ecto 2.0 and trying to build a DBConnection.Pool. Writing a pool is not easy and doing transactions with a pool is even harder. DBConnection separates pooling from many of the aspects of handling database connections, including transactions, and has a strict integration test suite that you can run to test your own pool.

The main issue with the Ecto 1.0 pool approach was that state was held in many places. With the Ecto.Pools.Poolboy pool, :poolboy holds state about clients processes (process that check out connections) and worker processes (process that supervises a connection process). The client holds state about the worker process and the connection process. The worker process holds state about a client process and a connection process. If something goes wrong in a transaction then :poolboy, the client process and the worker process all hold state about the transaction. Unfortunately the only possible authority on the actual transaction state, the connection process, did not hold any about that. It was possible for all three to have different views on the state depending on the timing of messages. The code is a bit of a nightmare, so hopefully I’ve put you off.

Switching to DBConnection only one process has a view of the connection state at a time, and only the client process holds transaction state and this is completely independent of pooling. The way this works is by checking out the socket and state, rather than a worker pid, from the pool, and directly using the socket and state in the client process. It also means a single connection can act as a pool with 1 socket. In effect it creates the socket and state, and has a queue of client process waiting to use the socket, effectively being a queue for a lock on a socket. This process is DBConnection and the base for the other pools.

In order to provide multiple connections we need to have multiple DBConnection.Connection processes. The DBConnection.Poolboy pool works by pooling DBConnection.Connection processes. This means a client process first checks out a DBConnection.Connection from :poolboy and then checkouts out the socket from that DBConnection.Connection. So we are running a pool inside a pool. I would suggest following this approach because then it is just a case of pooling processes and there is no need to worry about handling connections and the DBConnection callbacks, DBConnection.Connection will handle that. Here is the code for the poolboy pool: https://github.com/fishcakez/db_connection/blob/ada921407e3133f099bc88b6940ab5e61b15ac70/lib/db_connection/poolboy.ex https://github.com/fishcakez/db_connection/blob/ada921407e3133f099bc88b6940ab5e61b15ac70/lib/db_connection/poolboy/worker.ex.

The other two pools (DBConnection.Sojourn and DBConnection.Ownership) work slightly differently. These have a proxy process that checks out the socket from DBConnection.Connection and then passes that socket to the client process. However in both cases the proxy process doesn’t call any of the DBConnection callbacks and just passes the socket around.

Postgrex has a type cache per host/port/database/extension options combination. Ecto 2.0 has a query cache per Repo. An Ecto Repo’s query cache stores Postgrex query structs that depend on a Postgrex type cache. If one of these query structs is used on a Postgrex socket that doesn’t use the same Postgrex type cache the query will fail. This means to use the Ecto 2.0 query cache every connection in an Ecto Repo must use the same host/port/database/extension options. In Ecto 2.0 all the cached queries are named, inserted when first used and garbage collected when the Repo is stopped.

Postgrex sockets store the query structs that have been prepared on their backend so only prepares named queries when they need to be prepared. This means in Ecto 2.0 we can skip generating the SQL in Elixir and skip parsing the SQL in the Postgresql backend and can cache any query even if we don’t have a schema. When using pgbouncer the query cache is still used but all the names are ignored (use option prepare: :unnamed) i.e. every query must be prepared every time. However the type information for the cached query can still be reused and its the same number of round trips to the database. Reasonably standard ORM approach.

The Postgrex type cache is created when the first connection connects to postgresql on a host/port/database/extension combination. A query is done to fetch all the types on the server and this is compared to types Postgrex know hows to encode/decode. Then queries will only succeed if Postgrex knows how to encode/decode all parameters/fields. This cache is garbage collected when there are no connection processes for a combination during a period of time. This means if a database isn’t used for 1 minute the next connection will need to connect, authorise and bootstrap before it can serve a request, other connections to the database are blocked waiting for the bootstrap to complete.

Therefore you will want to have 1 Repo per database (tenant?) and at least 1 connection process per Repo per Elixir node. Both DBConnection.Poolboy and DBConnection.Sojourn can provide this with pool_size: 1, pool_overflow: n, where max would be pool_size + pool_overflow. The resize strategies for both pools are quite different though.

DBConnection.Poolboy will start a connection whenever it has overflow available and a client process requests a pooled DBConnection.Connection but none are waiting/idle. It close overflow connections when overflow is in use and a pooled process is waiting idle. This means process are started quickly and stopped rapidly. This can cause churn as connections get open only to be closed then immediately reopened.

DBConnection.Sojourn samples the queue speed at random intervals and decides whether to start a new connection based on these samples. The default algorithm requires pooled processes to be waiting less than 5ms for 100ms before it will start a connection, then interval decreases for each consecutive period where the pooled processes are waiting less than the target of 5ms, i.e. it goes approximately 100, 100, 70, 60, 50, 45, 40, 35. This tries to ensure that connections are started only when needed and throttles the maximum number of connections that can be started in a period of time. The pool shrinks when the oldest idle connection has been idle for more than 1 second, then waits 100ms, until it drops the next, then 100, 70, 60, 50 etc while the oldest idle connection has been idle for more than 1 second. This ensures that idle connections are dropped soon but not too quickly that they would be reopened immediately and not all at once if the pool gets a burst and then goes idle. This queue is LIFO so the most recently used connection is used first, so the least recently used is likely to stay idle. LIFO connection queues also work better with pgbouncer because it favours the most recently used connections too.

Alternative growing and shrinking strategies can be used with this pool. For example it is possible to always start a connection when the sampled queue speed is less than 5ms, rather than requiring below 5ms target for 100ms interval, using a different regulator callback module to the default. It is even possible to write your own regulator that decides what to do based on the sampled values.

DBConnection.Sojourn will also set off an alarm on the alarm_handler GenEvent if connections can’t connect to the database, and clear it when they can reconnect. An :ets based “protector” is on by default that will short circuit Repo requests when the database is down, prevent the pool process getting overloaded with more requests than it can handle, and actively drops requests to keep Repo queries feeling responsive to end users without dropping too many.

cjbottaro · June 21, 2016, 6:01pm

Wow, that is quite a lot to digest… Thanks for the answers.

It’s a bit overwhelming… which leads to my next question. Is it possible to create repos dynamically? Something like such:

repo = Repo.new(host: "10.0.0.1", adapter: "postgres", dbname: "blah", pool_size: 2)
query = from u in "users", where: u.age > 18, select: u.name
Repo.all(repo, query)
Repo.close(repo)

Thanks again!

P.S. I used your Connection library to make a Cassandra driver, very nice!

sabri · July 30, 2016, 9:17am

Can you share your Cassandra driver? I’ve been trying to connect to Cassandra using cqex but I could not connect via port 9042, I always get timeout error…

cjbottaro · July 30, 2016, 8:50pm

It was just for fun and learning. The code is completely disorganized and it doesn’t really have an API above basic CQL protocol stuff; it can make basic connections, construct CQL frames, then decode CQL response frames.

Also, I don’t really have a plan or strategy to do cluster management, so I just kinda gave up on it.

sabri · July 31, 2016, 6:37am

What I am looking for is a basic connection via username and password, and
direct CQL commands would be sufficient, if you can answer my question here
https://github.com/matehat/cqex/issues/9 that would be appreciated or
share your ecto adapter to learn via … thanks