Can't start application after upgrading elixir/erlang and oban packages

h9e · September 16, 2025, 12:21pm

Hello, we recently upgraded a few services:

elixir 1.15.8-otp-25 to 1.18.4-otp-27 (live deployed)
oban ~> 2.18.1 to ~> 2.19
oban_pro ~> 1.4.11 to ~> 1.5.0
oban_web ~> 2.10.2 to ~> 2.11

In my current task, I’m trying to upgrade 3 oban packages and it works fine locally. However, when starting application in a remote (testing) environment, it failed to start.

I have reverted oban versions and the app starts normally in test env. Remove cache in test env doesn’t help. We have postgresql 14.17

Any help on what’s happening? Please let me know if you need anything @sorentwo @sorenone . Thanks!

This error is logged in the remote server.

{"time":"2025-09-16T11:24:52.614Z","severity":"error",
"message":"GenServer {Oban.Registry, {Oban, {:producer, \"migration\"}}} terminating\n** (ArgumentError) errors were found at the given arguments:
 * 1st argument: the table identifier does not refer to an existing ETS table 
 (stdlib 7.0.2) :ets.select(:pro_ack_tab_0, [{{{:ack, \"Oban\", \"migration\", :_}, :_, :_, :_}, [], [:\"$_\"]}])
 (oban_pro 1.5.5) lib/oban/pro/engines/smart.ex:1012: Oban.Pro.Engines.Smart.get_acks/1
 (oban_pro 1.5.5) lib/oban/pro/engines/smart.ex:552: Oban.Pro.Engines.Smart.fetch_jobs/3
 (oban 2.19.4) lib/oban/engine.ex:252: anonymous fn/4 in Oban.Engine.fetch_jobs/3
 (oban 2.19.4) lib/oban/engine.ex:387: anonymous fn/3 in Oban.Engine.with_span/4
 (telemetry 1.3.0) /__w/balanced/balanced/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
 (oban 2.19.4) lib/oban/queue/producer.ex:253: Oban.Queue.Producer.start_jobs/1
 (oban 2.19.4) lib/oban/queue/producer.ex:244: anonymous fn/2 in Oban.Queue.Producer.dispatch/1 Last message: :dispatch
 State: %Oban.Queue.Producer{conf: %Oban.Config{dispatch_cooldown: 250, engine: Oban.Pro.Engines.Smart, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: \"pr-4147-84588b87fc-cbwqg\", notifier: {Oban.Notifiers.Postgres, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Pro.Plugins.DynamicPruner, [state_overrides: [cancelled: {:max_age, {1, :hour}}, completed: {:max_age, {1, :month}}, discarded: {:max_age, {3, :month}}]]}, {Oban.Pro.Plugins.DynamicLifeline, []}], prefix: \"public\", queues: [...truncated...], repo: Balanced.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, foreman: {:via, Registry, {Oban.Registry, {Oban, {:foreman, \"migration\"}}}}, meta: %Oban.Pro.Producer{__meta__: #Ecto.Schema.Metadata<:loaded, \"public\", \"oban_producers\">, uuid: \"01995245-2893-7c51-af4c-60a4400ca0a4\", name: \"Oban\", node: \"pr-4147-84588b87fc-cbwqg\", queue: \"migration\", started_at: ~U[2025-09-16 11:24:48.147420Z], updated_at: ~U[2025-09-16 11:24:48.147424Z], ack_async: true, ack_tab: :pro_ack_tab_0, refresh_interval: 30000, xact_delay: 1000, xact_retry: 5, xact_timeout: 30000, meta: %Oban.Pro.Producer.Meta{local_limit: 1, paused: false, shutdown_started_at: nil, global_limit: nil, rate_limit: nil}}, name: {:via, Registry, {Oban.Registry, {Oban, {:producer, \"migration\"}}}}, dispatch_timer: #Reference<0.534427290.921698305.255298>, refresh_timer: #Reference<0.534427290.921698305.253530>, dispatch_cooldown: 250, running: %{}}",
 "metadata":{"error":{"initial_call":null,"reason":"** (ArgumentError) errors were found at the given arguments:\n\n  * 1st argument: the table identifier does not refer to an existing ETS table\n\n    (stdlib 7.0.2) :ets.select(:pro_ack_tab_0, [{{{:ack, \"Oban\", \"migration\", :_}, :_, :_, :_}, [], [:\"$_\"]}])\n    (oban_pro 1.5.5) lib/oban/pro/engines/smart.ex:1012: Oban.Pro.Engines.Smart.get_acks/1\n    (oban_pro 1.5.5) lib/oban/pro/engines/smart.ex:552: Oban.Pro.Engines.Smart.fetch_jobs/3\n    (oban 2.19.4) lib/oban/engine.ex:252: anonymous fn/4 in Oban.Engine.fetch_jobs/3\n    (oban 2.19.4) lib/oban/engine.ex:387: anonymous fn/3 in Oban.Engine.with_span/4\n    (telemetry 1.3.0) /__w/balanced/balanced/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3\n    (oban 2.19.4) lib/oban/queue/producer.ex:253: Oban.Queue.Producer.start_jobs/1\n    (oban 2.19.4) lib/oban/queue/producer.ex:244: anonymous fn/2 in Oban.Queue.Producer.dispatch/1\n"},"error_logger":{"tag":"error","report_cb":"&:gen_server.format_log/1"},"function":"error_info/7","line":2785,"module":"gen_server","time":1758021892614425,"file":"gen_server.erl","domain":"[:otp]","erl_level":"error"}}

▸  Evaluation failed with: errors were found at the given arguments:
▸    * 1st argument: the table identifier does not refer to an existing ETS table

Our config looks like

config :myapp, Oban,
  repo: Myapp.Repo,
  engine: Oban.Pro.Engines.Smart,
  plugins: [
    Oban.Pro.Plugins.DynamicLifeline,
    {
      Oban.Pro.Plugins.DynamicPruner,
      state_overrides: [
        cancelled: {:max_age, {1, :hour}},
        completed: {:max_age, {1, :month}},
        discarded: {:max_age, {3, :month}}
      ]
    }
  ],
  queues: [
    default: 1,
    other_jobs: 1,
    ... truncated
  ],
  dispatch_cooldown: 250

And all children are started in application file

  def start(_type, _args) do
    maybe_install_ecto_dev_logger()

    Appsignal.Phoenix.LiveView.attach()

    children =
      maybe_cluster_supervisor() ++
        cache_supervisors() ++
        [
          MyAppWeb.Endpoint,
          MyApp.PromEx,
          MyApp.Repo,
          MyApp.Vault,
          MyAppWeb.Telemetry,
          {Phoenix.PubSub, [name: MyApp.PubSub]},
          {Task.Supervisor, name: MyApp.TaskSupervisor},
          {Oban, oban_config()}
        ]

    Oban.Telemetry.attach_default_logger()
    Oban.Web.Telemetry.attach_default_logger()

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  defp oban_config do
    config = Application.fetch_env!(:myapp, Oban)

    if config[:plugins] do
      Keyword.update!(config, :plugins, fn plugins ->
        Keyword.put(plugins, Oban.Pro.Plugins.DynamicCron, crontab: periodical_jobs(), timezone: "Etc/UTC")
      end)
    else
      config
    end
  end

sorentwo · September 16, 2025, 3:11pm

How are you upgrading the system? Is there any chance you have old code running in that environment? The stack trace you shared indicates that an ETS table is missing, but that’s started by the oban_pro application itself and they can’t just disappear without the whole application having an issue.

Also, since you rolled back, did you follow the upgrade guide for Pro v1.5, namely, did you run the migrations?

h9e · September 17, 2025, 7:51am

Hey @sorentwo

We did follow closely to the upgrade guide for Pro v1.5. But there was two issues during running migration locally and we found a workaround for that. I’m not sure that’s related to my original question, but i will post them below.

First issue, one of our migration (priv/repo/migrations/20250415122526_remove_fb_cron_jobs.exs) back in April 2025 has this error

** (Postgrex.Error) ERROR 42P01 (undefined_table) relation "oban_crons" does not exist

the migration

def up do
    execute("""
    DELETE FROM oban_crons
    WHERE name IN (
      'SyncNewTransactionsWorker',
      'SyncPendingTransactionsWorker'
    )
    """)
  end

Our workaround is to comment the execute/1 part out.

Second issue locally

15:17:27.849 [info] create index if not exists public.oban_jobs_unique_index
  ** (Postgrex.Error) ERROR 23505 (unique_violation) could not create unique index "oban_jobs_unique_index"
    table: oban_jobs
    constraint: oban_jobs_unique_index
  Key (uniq_key)=(aoPUIkpxeBl1gCo1UKPzmYERP3oNKxp+BnbF5bObe9M) is duplicated.
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:1096: Ecto.Adapters.SQL.raise_sql_call_error/1

after investigation, we tried to prune manually completed jobs that caused this constraint violation, with another migration (below) that will run before the upgrade_oban_pro migration. Note: we found our unique job config was wrong and also fix that to prevent future unique index violation.

  def change do
    query = """
    DELETE from oban_jobs
    WHERE worker = 'SyncNewTransactionsWorker'
    AND args->>'init' != 'true'
    AND meta->>'uniq' = 'true'
    AND state = 'completed'
    ;
    """

    execute(query)
  end

After 2 issues resolved locally, upgrade_oban_pro run successfully with mix ecto.migrate. Then we had the current issue in the original post with remote server.

Is there any chance you have old code running in that environment?

We can start each test environment per PR, server works totally fine for all PRs except this one with oban upgrades. As mentioned, we removed also caching for CI/CD, but it didn’t help.

At this point, we couldn’t find anyone asking similar issue, maybe there’s something with our data specific? Or issue related to the wrong unique setup? But we still cannot think of how ETS table is missing at this point

sorentwo · September 17, 2025, 9:22am

Based on the issues you shared, no, it looks like those issues wouldn’t be related to the problem you’re seeing. I wanted to confirm that the proper tables were there and there wasn’t a chance that the oban_pro application shut down.

Sorry, this is the first we’ve heard of this issue. It definitely wouldn’t have anything to do with your unique setup, or anything related to the actual jobs.

Will you try confirming a few things? First, that the Pro application is running with Application.start(:oban_pro). If it’s running, you’ll get an :already_started error.

Then, if it’s there, double check whether the tables are running:

for name ← :ets.all(), inspect(name) =~ “pro_ack”, do: name