GenServer servor error at deploy since Phoenix upgrade, may be related to EventStore?

Hi everyone,

I’m currently trying to upgrade :phoenix ~> 1.6.6 to ~> 1.6.15, :phoenix_live_dashboard ~> 0.6.4 to ~> 0.7, and :phoenix_pubsub ~> 2.0 to ~> 2.1.1.

I made a commit with only mix.exs & mix.lock upgrades, alongside some upgrading-related modifications (like changing function calls from Phoenix.LiveView.Helpers to Phoenix.Component or changing <LivePatch to <.link patch= kind of stuff), but no logic update or DB migration.

Everything works fine locally, but when I tried to deploy it with Fly.io on our staging website, I got this error :

Error: 2-12-12T09:29:25Z   [info]09:29:25.347 [error] GenServer {PhoenixApp.Auctions.EventStore.EventStore.Subscriptions.Registry, {"$all", "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]** (CaseClauseError) no case clause matching: {:error, %Postgrex.Error{connection_id: 3514705, message: nil, postgres: %{code: :undefined_column, file: "parse_relation.c", line: "3294", message: "column \"deleted_at\" does not exist", pg_code: "42703", position: "70", routine: "errorMissingColumn", severity: "ERROR", unknown: "ERROR"}, query: ["SELECT\n  stream_id,\n  stream_uuid,\n  stream_version,\n  created_at,\n  deleted_at\nFROM ", "public", ".streams\nWHERE stream_uuid = $1;\n"]}}

This error leads to an “Alloc Unhealthy because of failed task” error and breaks the deployment.

But we don’t have any deleted_at and created_at references in our code, nor in our DB. And I’ve tried redeploying the previous commit with success, leading me to believe this is Phoenix upgrade-related.

Here are the complete logs
Error: 2-12-12T09:29:25Z   [info]09:29:25.271 [error] GenServer {PhoenixApp.Auctions.EventStore.EventStore.Subscriptions.Registry, {"$all", "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]** (CaseClauseError) no case clause matching: {:error, %Postgrex.Error{connection_id: 3514704, message: nil, postgres: %{code: :undefined_column, file: "parse_relation.c", line: "3294", message: "column \"deleted_at\" does not exist", pg_code: "42703", position: "70", routine: "errorMissingColumn", severity: "ERROR", unknown: "ERROR"}, query: ["SELECT\n  stream_id,\n  stream_uuid,\n  stream_version,\n  created_at,\n  deleted_at\nFROM ", "public", ".streams\nWHERE stream_uuid = $1;\n"]}}
2022-12-12T09:29:25Z   [info]09:29:25.296 [debug] Subscription "Auctions.Projectors.Auction"@"$all" catching-up
2022-12-12T09:29:25Z   [info]09:29:25.296 [debug] PhoenixApp.Auctions.Projectors.Auction has successfully subscribed to event store
Error: 2-12-12T09:29:25Z   [info]09:29:25.307 [error] GenServer {PhoenixApp.Auctions.EventStore.EventStore.Subscriptions.Registry, {"$all", "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]** (CaseClauseError) no case clause matching: {:error, %Postgrex.Error{connection_id: 3514706, message: nil, postgres: %{code: :undefined_column, file: "parse_relation.c", line: "3294", message: "column \"deleted_at\" does not exist", pg_code: "42703", position: "70", routine: "errorMissingColumn", severity: "ERROR", unknown: "ERROR"}, query: ["SELECT\n  stream_id,\n  stream_uuid,\n  stream_version,\n  created_at,\n  deleted_at\nFROM ", "public", ".streams\nWHERE stream_uuid = $1;\n"]}}
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription_fsm.ex:422: EventStore.Subscriptions.SubscriptionFsm.catch_up_from_stream/1
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription_fsm.ex:98: EventStore.Subscriptions.SubscriptionFsm.transition/3
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription.ex:186: EventStore.Subscriptions.Subscription.handle_cast/2
2022-12-12T09:29:25Z   [info]    (stdlib 3.17.2) gen_server.erl:695: :gen_server.try_dispatch/4
2022-12-12T09:29:25Z   [info]    (stdlib 3.17.2) gen_server.erl:771: :gen_server.handle_msg/6
Error: 2-12-12T09:29:25Z   [info]09:29:25.311 [error] GenServer {PhoenixApp.Auctions.Application, {PhoenixApp.Auctions.Application, Commanded.Event.Handler, "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]09:29:25.337 [debug] Subscription "Auctions.Projectors.Auction"@"$all" catching-up
2022-12-12T09:29:25Z   [info]09:29:25.337 [debug] PhoenixApp.Auctions.Projectors.Auction has successfully subscribed to event store
Error: 2-12-12T09:29:25Z   [info]09:29:25.347 [error] GenServer {PhoenixApp.Auctions.EventStore.EventStore.Subscriptions.Registry, {"$all", "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]** (CaseClauseError) no case clause matching: {:error, %Postgrex.Error{connection_id: 3514705, message: nil, postgres: %{code: :undefined_column, file: "parse_relation.c", line: "3294", message: "column \"deleted_at\" does not exist", pg_code: "42703", position: "70", routine: "errorMissingColumn", severity: "ERROR", unknown: "ERROR"}, query: ["SELECT\n  stream_id,\n  stream_uuid,\n  stream_version,\n  created_at,\n  deleted_at\nFROM ", "public", ".streams\nWHERE stream_uuid = $1;\n"]}}
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription_fsm.ex:422: EventStore.Subscriptions.SubscriptionFsm.catch_up_from_stream/1
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription_fsm.ex:98: EventStore.Subscriptions.SubscriptionFsm.transition/3
2022-12-12T09:29:25Z   [info]    (eventstore 1.4.1) lib/event_store/subscriptions/subscription.ex:186: EventStore.Subscriptions.Subscription.handle_cast/2
2022-12-12T09:29:25Z   [info]    (stdlib 3.17.2) gen_server.erl:695: :gen_server.try_dispatch/4
2022-12-12T09:29:25Z   [info]    (stdlib 3.17.2) gen_server.erl:771: :gen_server.handle_msg/6
Error: 2-12-12T09:29:25Z   [info]09:29:25.352 [error] GenServer {PhoenixApp.Auctions.Application, {PhoenixApp.Auctions.Application, Commanded.Event.Handler, "Auctions.Projectors.Auction"}} terminating
2022-12-12T09:29:25Z   [info]09:29:25.358 [debug] Monitored server Postgrex terminate due to: :shutdown
Notice: -12-12T09:29:25Z   [info]09:29:25.362 [notice] Application phoenix_app exited: shutdown
2022-12-12T09:29:25Z   [info][os_mon] memory supervisor port (memsup): Erlang has closed
2022-12-12T09:29:25Z   [info][os_mon] cpu supervisor port (cpu_sup): Erlang has closed
2022-12-12T09:29:26Z   [info]{"Kernel pid terminated",application_controller,"{application_terminated,phoenix_app,shutdown}"}
2022-12-12T09:29:26Z   [info]Kernel pid terminated (application_controller) ({application_terminated,phoenix_app,shutdown})
2022-12-12T09:29:27Z   [info]Crash dump is being written to: erl_crash.dump...done
2022-12-12T09:29:27Z   [info]Starting clean up.

Thank you for all your help, leads, or ideas that could help me :slight_smile:

Update:

It looks like it’s related to event_store. When I add {:eventstore, "== 1.1.0", override: true} to mix.exs, it fixes deploy. But it’s not an acceptable long-term fix.

I’m guessing it’s related to a recent breaking change, but the only one I see is in 1.2 ( Changelog — EventStore v1.4.1 ) with EventStore.Tasks.Init.exec/3 deprecated for EventStore.Tasks.Init.exec/1 and EventStore.Tasks.Init.exec/2. But fixing that still give me :

 v428 is being deployed
cd3fd680: cdg running healthy
--> v428 failed - Failed due to unhealthy allocations - rolling back to job version 427 and deploying as v429 

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

In any case, everything still works locally; the error still occurs at deployment.

Any other ideas?

Perhaps you have timestamps

Yes. But If I believe the documentation, it’s supposed to generates inserted_at and updated_at timestamp fields only. No?

See → Ecto.Schema — Ecto v3.9.2

Update:

It looks like it’s related to event_store. When I add {:eventstore, "== 1.1.0", override: true} to mix.exs, it fixes deploy. But it’s not an acceptable long-term fix.

I’m guessing it’s related to a recent breaking change, but the only one I see is in 1.2 ( Changelog — EventStore v1.4.1 ) with EventStore.Tasks.Init.exec/3 deprecated for EventStore.Tasks.Init.exec/1 and EventStore.Tasks.Init.exec/2. But fixing that (without overriding to 1.1) still give me :

 v428 is being deployed
cd3fd680: cdg running healthy
--> v428 failed - Failed due to unhealthy allocations - rolling back to job version 427 and deploying as v429 

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

In any case, everything still works locally; the error still occurs at deployment.

Any other ideas?

There is a required database migration when upgrading from 1.1.0 to 1.2.0 - among other things, it adds the deleted_at column that your error message says is missing:

More detail about running `event_store` migrations`

2 Likes

Hi @al2o3cr , thanks for identifying the problem and sending me the related documentation. However, the solution was not exactly what is documented… I’ll go over my failed attempts and finally give my solution.

:no_entry: Migrating with mix
We use mix release so, as documented, we don’t have mix available. Leading to the second solution documented :

:no_entry: Migrating using a helper function
I don’t understand this documented solution; we should use the command eval "MyApp.ReleaseTasks.migrate_event_store()" inside the release folder… Which does not exist because the release fails without the migration of the event store!

cd3fd680: cdg running healthy
--> v428 failed - Failed due to unhealthy allocations - rolling back to job version 427 and deploying as v429 

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

To try to do the migration during the deployment, I thought of calling it in my DockerFile:

:no_entry: Migrating with mix in docker
mix event_store.migrate failed with the error :nofile

 => ERROR [build  8/18] RUN mix event_store.migrate                                                                                                                                                 3.0s
------
 > [build  8/18] RUN mix event_store.migrate:
#19 1.498 Generated my_app app
#19 2.903 ** (Mix) Could not load MyApp.Auctions.EventStore, error: :nofile. Please configure your app accordingly or pass a event store with the -e option.
------
Error failed to fetch an image or build from source: error building: executor failed running [/bin/sh -c mix event_store.migrate]: exit code: 1

Same error with the -e option

 => ERROR [build  8/18] RUN mix event_store.migrate -e MyApp.Auctions.EventStore                                                                                                             2.6s
------                                                                                                                                                                                                   
 > [build  8/18] RUN mix event_store.migrate -e MyApp.Auctions.EventStore:
#19 1.469 Generated my_app app
#19 2.556 ** (Mix) Could not load MyApp.Auctions.EventStore, error: :nofile. Please configure your app accordingly or pass a event store with the -e option.
------
Error failed to fetch an image or build from source: error building: executor failed running [/bin/sh -c mix event_store.migrate -e MyApp.Auctions.EventStore]: exit code: 1

:white_check_mark: Migrating using a helper function inside Release.migrate()
So I finally thought of using the release function that we pass to Fly.

Fly.io call this command [deploy] release_command = "/app/bin/my_app eval MyApp.Release.migrate" at every deployment, which ensure everything started, load the app, and call Ecto.Migrator.with_repo().

I added a call to my helper function MyApp.Release.migrate_evenstore and voilà :tada:

Thanks everyone for the help on this :slightly_smiling_face: and sorry for not having updated sooner!

1 Like