I’ve been experiencing a bit of timeout in production with respect to my queries. I’ve going through the docs and a few resources here, I came up with the below configuration. Kindly advise if it will suffice for production.
I would probably leave the default timeout alone, and only increase it for specific Repo calls. Regularly needing to support queries up to 60 seconds seems like an issue.
@benwilson512, thanks a lot sir, you’ve been a great source of inspiration and a mentor. I’m most grateful.
How do I use queue_timeout and queue_target and what’s the difference. I’ve not been successful in getting any material on queue_target. How do I use this with Ecto. Just like the others, should it be configured for the Repo.
Thanks for the documentation. I’m trying to wrap my head around those parameter. How does the DBConnection fits into Ecto architecture. For queue_timeout and queue_target, I believe I have to set them alongside the Repo configuration in the config file like above.
Thanks.
db_connection is what ecto_sql steps on. It is the lower-level library that Ecto’s DB handling code (namely the ecto_sql library) is wrapping in a more convenient package for us.
And yes, you simply have to add queue_timeout and queue_target to the Repo configuration. If you expect long transactions then it’s worth to simply increase queue_target to something larger and ignore the other option. In my hobby and professional apps I usually bump it up all the way to 5000 (ms) to give the apps breathing room to wait for a connection from the pool in high-load conditions. Having Ecto error out in such a situation isn’t very useful to me so I first increase the queue_target and then start analysing why my transactions take so long.
Alternatively, you can just increase pool_size to a bigger number. I rarely accept the default of 10 and usually set it to 20.
EDIT: Here’s my Repo config in one of my hobbyist projects that does ingestion of public retail datasets:
It makes a lot of sense now. Thanks a lot for the awesome explanation as well as the configuration details. I will apply changes to my config file on production and monitor.
To post an update 5.5 years later, I use those a little differently now.
Revised understanding
queue_interval is the time during which DBConnection (and effectively Ecto) monitors if every single checkout takes queue_target or more time. If so, queue_target is doubled and then another duration of queue_interval is being monitored. If every single checkout takes takes 2x queue_target or more time for the next queue_interval time then DBConnection starts dropping messages on the Elixir level so as to protect the DB.
This is very rarely what I wanted, both in hobby and professional projects. Most users I spoke with said they prefer to occasionally have to twiddle their thumbs and wait for 10-30 seconds for a page to load, as opposed to being hit in the face with a cryptic error message and a page full of stack traces (or the dreaded empty page with the single line of text of “Sorry, something went wrong”) that stops them dead on their tracks.
Example
Setting queue_target to 200 and queue_interval to 10000 means that if for 10 seconds every single connection checkout takes 200ms or more, then the tolerance for waiting a connection checkout gets raised to 400ms. If during the next 10 seconds every connection checkout takes 400ms or more, then DBConnection starts dropping messages (i.e. nothing goes to the DB).
If I am tragically misunderstanding this and misleading others then please, anyone who is informed better, correct me.
It’s fine for individual requests beings slow. If everything is slow you need to eventually shed load or your queue(s) might grow faster than it can shrink leading to other problematic effects. One can argue that default limits might be too tight, but you want the behaviour in general.
I agree on “eventually” – that’s always the goal. And I agree that the default limits are too tight to be practically useful because the alerts produced by them often lead to an APM notification fatigue.
There’s hardly a good way to have a generally useful default. A db on the same host will always be faster than a db on another host and even more so if both machines are not within the same DC. So you need to figure out what a realistic number for queue_target actually is. It furthermore depends on how long the avg query takes, because that’ll affect how long a query needs to wait once the queue is fully utilized. The latter will also affect how queue_interval should be sized.