Sudden occurrence of Supervisor "failed to start child" without code change

I’ve been setting up my first Phoenix deployment on gigalixir. I’m deploying with mix, and I was having some trouble, so I did an iex -S mix to practice at home. That resulted in this:

[info] Application crit exited: Crit.Application.start(:normal, []) returned an error: shutdown: failed to start child: Crit.Setup.InstitutionSupervisor
    ** (EXIT) bad child specification, more than one child specification has the id: "critter4us".

mix phx.server also fails. I now get the same failure for a git revision from a week ago, well before I started to deploy. Moreover:

  • I did both a mix clean and removed the _build directory.
  • It happens with a new shell (so not an environment issue).
  • I hadn’t changed the version of Elixir or Erlang I was running. (Both are up-to-date.)
  • ps shows that there are no Erlang processes running.
  • Anyway, the app is not distributed. It’s a very basic Phoenix app.

For what it’s worth, the application.ex looks like this:

    # List all child processes to be supervised
    children = [
      # Start the Ecto repository
      Crit.Repo,
      # Start the endpoint when the application starts
      CritWeb.Endpoint,
      # Starts a worker by calling: Crit.Worker.start_link(arg)
      Crit.Audit.ToEcto.Server,
      Crit.Setup.InstitutionSupervisor,
      {ConCache, [name: Crit.Cache,
                  ttl_check_interval: :timer.hours(24),
                  global_ttl: :timer.hours(48)]}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Crit.Supervisor]
    Supervisor.start_link(children, opts)

The problem is definitely around InstitutionSupervisor. If I delete it from application.ex, mix phx.server starts.

An explicit :id doesn’t help. Given:

Supervisor.child_spec(Crit.Setup.InstitutionSupervisor, id: :aeb7bcdf3a91), 

… (as suggested in the error message) results in this:

Application crit exited: Crit.Application.start(:normal, []) returned an error: shutdown: failed to start child: :aeb7bcdf3a91
    ** (EXIT) bad child specification, more than one child specification has the id: "critter4us".

As far as I read the error, a child of that supervisor has the double ID, can you please show the start and children of InstitutionSupervisor?

1 Like

Most likely the problem is not with InstitutionSupervisor itself, but with one of its children having a duplicated id. Check how the id of its children is computed, and if there is the possibility that two children get assigned the same id. In case you cannot find it, as @NobbZ noted, you can post the children specification inside InstitutionSupervisor and someone here might be able to spot the problem.

3 Likes

Argh!

There’s one process per “institution”, with the institution’s name as the process id*, and I somehow ended up running the seeding process twice, meaning copies of institutions.

Screen Shot 2020-02-26 at 12.14.46 PM

Thank you! Time for a create unique_index.

  • I know I should be using names less prone to duplication, like illinois_institution_process. Haven’t done that yet.
1 Like

Are you generating the names from a database?

You really should use :via tuples and Registry then instead of creating random atoms from the database…

2 Likes

In this particular case, there are currently two names, and I’d expect to add at most one name per year. And the names are used by a process that maintains its own map of names to worker processes.

  def handle_call(:timeslots, _from, state) do
    {:reply, state.institution.timeslots, state}
  end

I don’t know how Registry would help this code, but I am still very new at this.

You are generating atoms at runtime, this is evil. If there appear rows in that table by a bug and you get flodded by atoms that way you have a problem. By using a registry, you could use the string as identifier or even the numeric id.

1 Like

When/if it happens that rows in that table are created other than manually, I’ll use a more robust solution. In the meantime, there’s more important work.