Oban.Registry cashes inside devcontainer when recompiling project

I’ve been running my Phoenix app with Oban (not Pro) inside my devcontainer recently, but every 2nd to 3rd time when I make a code change and need to recompile 200+ modules, the Oban.Registry crashes which then terminates my application.

This is a typical output of such a crash:

Compiling 273 files (.ex)
[error] GenServer {Oban.Registry, {Oban, Oban.Stager}} terminating
** (UndefinedFunctionError) function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)
    (MyApp 1.3.61) MyApp.Repo.transaction(#Function<4.61619042/0 in Oban.Stager.stage_and_notify/2>, [prefix: "public", log: false, oban: true, telemetry_options: [oban_conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "16a78d4bc670", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}]])
    (oban 2.21.1) lib/oban/repo.ex:156: Oban.Repo.transaction/4
    (oban 2.21.1) lib/oban/stager.ex:85: Oban.Stager.stage_and_notify/2
    (oban 2.21.1) lib/oban/stager.ex:59: anonymous fn/2 in Oban.Stager.handle_info/2
    (telemetry 1.4.1) /workspace/deps/devcontainer/telemetry/src/telemetry.erl:359: :telemetry.span/3
    (oban 2.21.1) lib/oban/stager.ex:58: Oban.Stager.handle_info/2
    (stdlib 7.2.1) gen_server.erl:2434: :gen_server.try_handle_info/3
    (stdlib 7.2.1) gen_server.erl:2420: :gen_server.handle_msg/3
    (stdlib 7.2.1) proc_lib.erl:333: :proc_lib.init_p_do_apply/3
Last message: :stage
State: %Oban.Stager{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "16a78d4bc670", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, timer: #Reference<0.2559755388.3220963335.234290>, interval: 1000, limit: 5000, mode: :global}

*** -> The error above repeats a few times

[notice] Application my_app exited: shutdown

I already tried wrapping the Oban Supervisor in my own Supervisor that has a higher restart retry count and also an exponential backoff for restarting the Oban.Registry, but that didn’t help either.

I don’t experience this outside of my devcontainer, so I’m not sure what causes this.

This is the Stager module crashing, not Oban.Registry. The registered name of the process is an Oban.Registry via tuple, which is why it looks like the registry is terminating.

The issue is most likely due to recompilation causing the originally referenced MyApp.Repo module to be purged. The next query after that point fails to reference that module, which causes the shutdown error. Since the Stager runs queries once a second, more frequently than anything else, it experiences the crash. Changing the number of restarts or wrapping it won’t help because the reference stays the same.

Here’s an explanation of what I believe is happening with the Repo module, from “The BEAM Book”, p129:

After a module is loaded then any fully qualified calls (i.e. Module:function), also called remote calls, will go to the new version. Note that if you have a server loop without a remote call then it will continue running the old code.

I’m not sure why this would happen in a devcontainer specifically, but it’s something we’ve heard mentioned a few times in the past.

Looking into a potential fix.

@PJUllrich My hypothesis was close, but it seems like this could be specifically due to slow recompile speeds and that’s why it happens in a dev container exclusively.

Please give oban on main a try, this just landed (Tolerate unavailable repo modules in Stager · oban-bg/oban@63862c3 · GitHub)

It worked! Now my application doesn’t shut down anymore :slight_smile: Thank you!

Compiling 273 files (.ex)
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
Generated my_app app

@sorentwo sorry, I also get this issue with the Oban.Peer now. Same error message, just the Oban.Stager is now Oban.Peer

[error] GenServer {Oban.Registry, {Oban, Oban.Peer}} terminating
** (UndefinedFunctionError) function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)
    (my_app 1.3.62) MyApp.Repo.transaction(#Function<2.44248456/0 in Oban.Peers.Database.terminate/2>, [prefix: "public", log: false, oban: true, telemetry_options: [oban_conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "e308cc20a215", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Plugins.Cron, [crontab: [{Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}], retry: 1])
    (oban 2.21.1) lib/oban/repo.ex:156: Oban.Repo.transaction/4
    (oban 2.21.1) lib/oban/peers/database.ex:80: Oban.Peers.Database.terminate/2
    (stdlib 7.2.1) gen_server.erl:2482: :gen_server.try_terminate/3
    (stdlib 7.2.1) gen_server.erl:2733: :gen_server.terminate/9
    (stdlib 7.2.1) proc_lib.erl:333: :proc_lib.init_p_do_apply/3
Last message: :election
State: %Oban.Peers.Database{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "e308cc20a215", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Plugins.Cron, [crontab: [{Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, timer: #Reference<0.2574348041.3782213636.107865>, interval: 30000, leader?: true, leader_boost: 2}

Also, this might be unrelated to the issue above, I can’t open the Oban Web dashboard inside the devcontainer anymore :frowning: Maybe this is related to me fetching directly from Oban’s master branch and having to add override: true to fetch the dependency?

[error] ** (RuntimeError) no config registered for [Oban, Oban.Met] instance
    (oban_web 2.12.2) lib/oban/web/dashboard_live.ex:212: Oban.Web.DashboardLive.await_init/2
    (oban_web 2.12.2) lib/oban/web/dashboard_live.ex:19: Oban.Web.DashboardLive.mount/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/utils.ex:356: anonymous fn/6 in Phoenix.LiveView.Utils.maybe_call_live_view_mount!/5
    (telemetry 1.4.1) /workspace/deps/devcontainer/telemetry/src/telemetry.erl:359: :telemetry.span/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/static.ex:324: Phoenix.LiveView.Static.call_mount_and_handle_params!/5
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/static.ex:155: Phoenix.LiveView.Static.do_render/4
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/controller.ex:39: Phoenix.LiveView.Controller.live_render/3
    (phoenix 1.8.5) lib/phoenix/router.ex:416: Phoenix.Router.__call__/5
    (my_app 1.3.62) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2
    (my_app 1.3.62) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint."call (overridable 3)"/2
    (my_app 1.3.62) deps/devcontainer/plug/lib/plug/debugger.ex:155: MyAppWeb.Endpoint."call (overridable 4)"/2
    (my_app 1.3.62) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2
    (phoenix 1.8.5) lib/phoenix/endpoint/sync_code_reload_plug.ex:22: Phoenix.Endpoint.SyncCodeReloadPlug.do_call/4
    (bandit 1.10.4) lib/bandit/pipeline.ex:131: Bandit.Pipeline.call_plug!/2
    (bandit 1.10.4) lib/bandit/pipeline.ex:42: Bandit.Pipeline.run/5
    (bandit 1.10.4) lib/bandit/http1/handler.ex:13: Bandit.HTTP1.Handler.handle_data/3
    (bandit 1.10.4) lib/bandit/delegating_handler.ex:18: Bandit.DelegatingHandler.handle_data/3
    (bandit 1.10.4) lib/bandit/delegating_handler.ex:8: Bandit.DelegatingHandler.handle_continue/2
    (stdlib 7.2.1) gen_server.erl:2424: :gen_server.try_handle_continue/3
    (stdlib 7.2.1) gen_server.erl:2291: :gen_server.loop/4

The peer will only run every 15s, that should be much rarer. Since you’re running this as a single node in development, it’s more convenient to use peer: Oban.Peers.Global anyhow.

Ever, or during a recompilation cycle?

Seems unlikely. That message indicates that Oban.Met failed to start or crashed. It runs a count query every second, and is probably crashing in the background less loudly than the Stager was.

The UndefinedFunctionError check may need to move into Oban.Repo as a catch-all.

Thanks! I’ve added this to dev.exs and will check whether that fixes the Oban.Peer crash.

config :my_app, Oban, peer: Oban.Peers.Global

Hmm, it times out in the await_init/2 right away. I can never start it. Also not outside the devcontainer. Hang on, I gotta see what’s wrong here.

My fault. I added this to my dev.exs in a previous attempt to fix the crashes. I’ve removed it and now Oban.Met starts as expected and the ObanWeb dashboard works again. Sorry for the noise.

# config/dev.exs

config :oban_met, auto_start: false

That would certainly prevent Oban.Met from starting :grin:

But now it crashes too when I recompile :smiley:

Compiling 273 files (.ex)
[warning] [message: "Stager skipped tick, repo module unavailable: function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)", source: :oban, module: Oban.Stager]
[error] GenServer {Oban.Registry, {Oban, Oban.Met.Reporter}} terminating
** (UndefinedFunctionError) function MyApp.Repo.transaction/2 is undefined (module MyApp.Repo is not available)
    (my_app 1.3.62) MyAApp.Repo.transaction(#Function<2.121074915/0 in Oban.Met.Reporter.checks/1>, [prefix: "public", log: false, oban: true, telemetry_options: [oban_conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "826c835d7fe1", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Global, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyAApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}]])
    (oban 2.21.1) lib/oban/repo.ex:156: Oban.Repo.transaction/4
    (oban_met 1.1.0) lib/oban/met/reporter.ex:99: Oban.Met.Reporter.handle_info/2
    (stdlib 7.2.1) gen_server.erl:2434: :gen_server.try_handle_info/3
    (stdlib 7.2.1) gen_server.erl:2420: :gen_server.handle_msg/3
    (stdlib 7.2.1) proc_lib.erl:333: :proc_lib.init_p_do_apply/3
Last message: :checkpoint
State: %Oban.Met.Reporter{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "826c835d7fe1", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Global, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, name: {:via, Registry, {Oban.Registry, {Oban, Oban.Met.Reporter}}}, queue_timer: nil, check_timer: #Reference<0.1783403821.41943043.185227>, auto_migrate: true, checks: %{"available" => [], "cancelled" => [], "completed" => [%{value: 2, state: "completed", queue: "default", series: :full_count}], "discarded" => [], "executing" => [], "retryable" => [], "scheduled" => [], "suspended" => []}, check_counter: 443, check_interval: 1000, estimate_limit: 50000, function_created?: true, queues: ["default"]}
[notice] Application oban_met exited: shutdown
[error] GenServer #PID<0.1301.0> terminating
** (MatchError) no match of right hand side value:

    []

    (oban_met 1.1.0) lib/oban/met/examiner.ex:148: Oban.Met.Examiner.table/1
    (oban_met 1.1.0) lib/oban/met/examiner.ex:44: Oban.Met.Examiner.all_checks/1
    (oban_web 2.12.2) lib/oban/web/live/connectivity_component.ex:10: Oban.Web.ConnectivityComponent.update/2
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/utils.ex:500: Phoenix.LiveView.Utils.maybe_call_update!/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/diff.ex:296: anonymous fn/4 in Phoenix.LiveView.Diff.update_component/3
    (telemetry 1.4.1) /workspace/deps/devcontainer/telemetry/src/telemetry.erl:359: :telemetry.span/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/diff.ex:295: anonymous fn/4 in Phoenix.LiveView.Diff.update_component/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/diff.ex:229: Phoenix.LiveView.Diff.write_component/4
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/diff.ex:287: Phoenix.LiveView.Diff.update_component/3
    (phoenix_live_view 1.1.28) lib/phoenix_live_view/channel.ex:321: Phoenix.LiveView.Channel.handle_info/2
    (stdlib 7.2.1) gen_server.erl:2434: :gen_server.try_handle_info/3
    (stdlib 7.2.1) gen_server.erl:2420: :gen_server.handle_msg/3
    (stdlib 7.2.1) proc_lib.erl:333: :proc_lib.init_p_do_apply/3
Process Label: {Phoenix.LiveView, Oban.Web.DashboardLive, "lv:phx-GKZB989C72m0VwIk"}
Last message: {:phoenix, :send_update, {{Oban.Web.ConnectivityComponent, "connectivity"}, %{id: "connectivity", status: :reset, title: "Node is solitary: Not connected to any cluster", conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "826c835d7fe1", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Global, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Plugins.Pruner, []}], prefix: "public", queues: [default: [limit: 10], backups: [limit: 5], social_feeds: [limit: 5]], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, __changed__: %{}, flash: %{}, myself: %Phoenix.LiveComponent.CID{cid: 2}}}}

You just need to stop recompiling :laughing:. Kidding, that’s what I thought might happen when I mentioned " The UndefinedFunctionError check may need to move into Oban.Repo as a catch-all." above.

Sorry Boss, Parker said no more work today! Cya tomorrow!

I’ve overhauled the stager-specific approach to be more versatile and resilient, please give Shift UndefinedFunctionError handling into Repo · oban-bg/oban@0c23e68 · GitHub a try.

Note that it’s now possible to control the Repo’s retry behavior as well by configuring the retry opts. With the defaults, you’ll have ~2.5s of resiliency during slow compilation.