Immediately after redeploy HTTP requests trigger Ecto Repo missing error

Hi. We are running a pretty standard Phoenix app with steady traffic in production. Whenever we redeploy the app we consistently get a few errors that look like this:

** (exit) an exception was raised:
    ** (ArgumentError) argument error
        (stdlib 3.13.2) :ets.lookup_element(Ecto.Repo.Registry, #PID<0.7221.0>, 3)
        (ecto 3.5.4) lib/ecto/repo/registry.ex:24: Ecto.Repo.Registry.lookup/1

** (exit) an exception was raised:
    ** (RuntimeError) could not lookup Ecto repo Core.Repo because it was not started or it does not exist
        (ecto 3.5.4) lib/ecto/repo/registry.ex:19: Ecto.Repo.Registry.lookup/1
        (ecto 3.5.4) lib/ecto/repo/queryable.ex:210: Ecto.Repo.Queryable.execute/4
        (ecto 3.5.4) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
        (ecto 3.5.4) lib/ecto/repo/queryable.ex:149: Ecto.Repo.Queryable.one/3

Then after a few of these nothing raises and it’s fine until the next deploy.

It appears that web requests and database calls are coming in before Repo is available or registered. How do we ensure Repo is fully available before any calls are made to it?

Can you please check you Applicaction module? The repo needs to be started before the endpoint.

Yeah Repo is before Endpoint in the list of children.

Is there anything else that is trying to access the repo? Can you share the list of your workers?

This looks surprisingly similar to How to wait for the full init of Ecto.Repo before making requests

1 Like

Yeah that was also the only result I could find that was remotely related to this. The list of workers is the same as a freshly generated phoenix app.

Do the children get booted in order sequentially? Or is there some async race condition that’s possible?

The children are started synchronously in the order in which they are listed. It shoud just-work-TM.

Trying to guess here, but I’d look into the following:

  • Config: when/how are you setting the ecto_repos option? Do you use runtime config?
  • Can you reproduce it by running prod build/release locally?
  • Generate fresh Phoenix app and run a diff to see where they diverged,
  • Try to reproduce the error on a freshly generated Phoenix app with your exact dependencies,
  • Try deploying a freshly generated Phoenix app to staging environment (if you have one).
1 Like

Just following up here with the solution we discovered.

Turned out we had a controller calling a function that was making Ecto queries inside an unsupervised Task.async. We switched this to the supervised version of Task.Supervisor.async_nolink but still encountered similar errors.

We wound up refactoring this code to longer perform Ecto queries inside an async Task and the errors have gone away.

1 Like