I’m running an application with a fairly simple Phoenix web UI, that enables registering requests which are then handled asynchronously by synchronizing status updates with another system.
However, when the PostgreSQL database is taken down temporarily for maintenance, the Ecto repo will of course be disconnected, causing multiple errors on incoming requests (as well as for the background status updates running in a GenServer). Eventually the application crashes entirely, after the supervisor restart limits are reached.
I’d like to avoid crashing, preferrably “pause” the background updates, and display some kind of “ongoing maintenance, please try later” page to the end users whenever the database is inaccessible.
Has anyone worked out a good approach for something like this?
Right, and for eg. REST API usage, where I’ve created my own client code, I’d have a pretty clear idea about where to put a circuit breaker… But I’m not quite sure what to do about Phoenix
Perhaps the Endpoint would be a decent place, though.
A super trivial example with ecto and pg would be:
defmodule DatabaseStatus do
def check_status do
case Ecto.Adapters.SQL.query(Repo, "SELECT 1") do
{:ok, _} ->
:ok
_ ->
:error
end
end
end
You can then use this in a plug that you add high in your pipeline:
defmodule DbPlug do
def init(options), do: options
def call(conn, _) do
case Database.check_status() do
:ok ->
# render normal stuff
conn
:error ->
# halt the conn and render the maintenance page or return a json payload or whatever.
end
end
end
This isn’t super efficient since you’re going to be making a lot of extra db calls. A more efficient way to do this would be to run a second process that functions as a watch dog. It can periodically call Database.check_status/0 and use aggregates to trigger the status back and forth so it doesn’t flap. For instance you might decide that 5 failing calls in 30 seconds means that the db is down. That status could be written into an ets table which could be read in parallel. This pattern would be much more efficient than pinging the db in each request. Having a secondary process also allows you to overwrite the status via an api call. For instance if you know you’re going to go into maintenance mode you could trigger the api as part of that process. But whatever optimizations you want to do the general idea still stands and should help get you started.
I have this exact issue at work when the oracle database goes down for its maintenance periods. I setup a failure handler in the fallback_controller to handle the general case that the oracle server is down, and for pages that don’t touch the oracle database they still work fine, and the ones that do but can handle it being down by doing something else just handle it their own way. I don’t touch the oracle database pre-controller in plugs at all so I don’t have worry about it there.