Ecto pool_size?

Can someone explain the settings of pool_size of Ecto in config file? and what is the recommend size?

Thanks

2 Likes

I usually start with as many connections as the number of cores on your machine. And then fiddle around with it and do some benchmarking to reach an optimal size.

3 Likes

The pool_size controls how many connections you want to the database. The advice of starting one per core is good for CPU work. Connections however do a lot of IO work so I would recommend a minimum of at least 10. To find the ideal number you need to monitor and measure how your queues are behaving in production. A higher number is not necessarily better, as it implies in less re-use between connections and database overload.

10 Likes

The answer to your first question can be found on this Wikipedia article on Connection pool. Basically it’s a pool of connections to the DB (e.g. Postgres) being kept around and reused because opening and closing connections are costly operations.

The second question needs a more thorough analysis on your machine spec and your app’s (projected) load. I don’t think I have the knowledge to properly answer that, but my initial hypothesis is that we may set it as high as the machine’s resources let us. I might be wrong, though.

Edit: I stand corrected by Jose’s answer :slight_smile:

3 Likes

The optimal number of connections varies based on the DB engine. For example, Postgres recommends the following:

A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count). Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. Benchmarks of WIP for version 9.2 suggest that this formula will need adjustment on that release. There hasn’t been any analysis so far regarding how well the formula works with SSDs.

You can find detailed explanations here: https://wiki.postgresql.org/wiki/Number_Of_Database_Connections

At the end it is like Jose mentioned start with 10 and experiment (although 10 is rather arbitrary number but the formula above is a better starting point). Also consider that you might be running multiple instances of the service that talk to one DB - the formula described above is per DB instance not per clients instances it serves. So, there is plenty to experiment with.

8 Likes

Thanks all for support, truly friendly community :slight_smile:

In my case, I am having multi-tenancy, as I am using Postgres Schemas in One DB, one schema per each tenancy, isn’t there a safe size that I can go with as a starting point?

What kind of benchmarking is needed?

2 Likes