Ecto connection & supervision for unreliable DB

csadewa · July 6, 2023, 9:53am

Hi all, is there a recommended way to setup & supervise Ecto.Repo for unreliable & uncritical database?

For additional context, i have an application which connect to PostgreSQL (DB A) using standard Ecto.Repo supervision. This service then needs a new connection to another PostgreSQL Business Intelligence database (DB B) for implementing non-critical feature, but DB B is known to be unreliable and i don’t want the application to crash due to DB B failure escalation on supervision tree. For easy solution, i could just set supervision max_restart Ecto.Repo for DB B to be very high number, but is there a better way to handle the situation?.

dimitarvp · July 6, 2023, 12:34pm

You could also increase the connection timeout?

csadewa · July 7, 2023, 4:13am

Yes, that could also be one of quick solutions. The thing I am worried about unreliable Ecto.Repo is that supervisor seems should only be used to supervise critical component to application (see link) with expectation that if critical component fail repeatedly it should be escalated to supervisor (which is not behavior i want from DB B, as it’s non-critical), but Ecto.Repo would require supervision as part of it’s setup.

vfsoraki · July 20, 2023, 9:26pm

Apart from increasing timeouts and wrapping your uncritical queries with proper error handling, you can also start your DB B repo in a separate supervisor and increase the parameters of that supervisor alone so that it doesn’t crash if the repo of DB B crashed often.

Supervisor crash resiliency works with number of crashes in a given time frame. You should set them high enough so the supervisor itself doesn’t crash, like a max restart of 100 times in 1 second. Of course, you have to find these values yourself.