LiveView in production reloads page when a new release is pushed

aswinmohanme · December 17, 2021, 5:36am

I have a Phoenix LiveView app running in production at https://indiepaper.me. I use fly.io for deployment and Cloudflare as a proxy.

I deploy using rolling release and I have noticed that whenever a new release is pushed the WebSocket disconnects and the page fully reloads. It would not have been that much of an issue, but I have a Markdown Editor LiveView page that sends JSON through pushEvent and when the LiveView disconnects and the page reloads I lose all that state.

Is this an issue with LiveView or is Cloudflare causing that issue ?

trisolaran · December 17, 2021, 7:11am

When you start a new version of your application, the process that was handling the websocket connection for your liveview terminates. All the state that was in the liveview server-side process memory is lost, and the server’s end of the websocket’s connection disappears. In a normal shutdown, the server will send a disconnect to the client’s websocket, and the client will try to reconnect.

Even if the server is killed abruptly and the socket stays “half-open”, the phoenix websocket client has a keepalive mechanism that regularly sends heartbeat messages to the server to discover when the liveview process is down or not reachable. By default this happens every 30 seconds (phoenix/socket.js at 940664cd5af6528c5e4beef204e040dd27e4febf · phoenixframework/phoenix · GitHub). So if your server-side LV process is gone, the client will eventually notice and reconnect.

If I understand you correctly, your LiveView application has some JS that keeps local state on the client’s side. If you want this state to persist at the client across LiveView reconnects, maybe one option would be to store it in the client’s local storage (Window.localStorage - Web APIs | MDN)

I hope this helps

aswinmohanme · December 17, 2021, 10:02am

Thanks for the in-depth explanation, local storage seems like the way to go.

chrismccord · December 17, 2021, 6:50pm

The page shouldn’t reload unless the page detects and unrecoverable state like multiple failed mounts after a reconnect (or you added your own reconnect logic based on the static_changed? helper for refreshing the page when new assets are deployed. If you have liveSocket.enableDebug() enabled in the js console, it may tell you more when the refresh happens.

aswinmohanme · December 18, 2021, 4:29am

I have enabled liveSocket.enableDebug() on Staging deployment and a timeout error, doing a hard refresh error flashed on the console before doing a hard refresh.

I"m using the default Canary release strategy on Fly, I think the time between VM restarts are what causing the issue. What deployment strategy should I be using ?

trisolaran · December 21, 2021, 9:30pm

I think that, full page reload or not, after your socket reconnects the state that was stored in the LV process will be lost, and that seems to be the crux of the problem here. In order not to lose the state, you have to persist it somewhere (either at the client, or on the server).

sucipto · June 14, 2023, 5:14pm

I’m facing same issue on fly-io deployment.

I’ve try disable code_reloader on development, kill server and start server, form restored as expected.

But when on production (rolling strategy on fly-io), it’s cause liveview form reset. Also can reproduce using command fly apps restart <app-name>.

I dont have custom logic auto refresh using static_changed? and still using root.html.heex generated by phx.new 1.7.2.

Here is the log from console when enable debug on liveSocket:

phx-F2iR1h-Y2FukiQHB update:  -  {5: {…}} <-- normal phx-change event
phx-F2iR1h-Y2FukiQHB update:  -  {5: {…}}
phx-F2iR1h-Y2FukiQHB update:  -  {5: {…}} <-- normal phx-change event
phx-F2iR1h-Y2FukiQHB destroyed: the child has been removed from the parent -  undefined
phx-F2iR1h-Y2FukiQHB join: encountered 0 consecutive reloads -  undefined
Navigated to https://<app-name>.fly.dev/path <-- auto full page reload ??
phx-F2iR7k6oUSvKhAHx mount:  -  {0: 'xxxxxxx', 1: {…}, 2: {…}, 3: {…}, 4: {…}, 5: {…}, s:

chrismccord · June 14, 2023, 7:45pm

I just verified our proxy sends 1012 close (Service restart) code on deploy, so I don’t think it’s an issue on fly’s side. I also tested deploys on my own apps and it does not failsafe reload. It also looks like we’d only execute that failsafe reload path if the server sends 1000 close code when we don’t expect it, so I need to know more. Are you using only websockets, or the longpoll transport? Running cowboy, or bandit?

thomas.fortes · June 14, 2023, 9:59pm

It seems that Bandit throws a 1002 whenever a code that isn’t in the RFC 6455 is used to close the connection, which isn’t exactly wrong since it uses the RFC as base and these codes aren’t in any of the updates to the RFC…

github.com

mtrudel/bandit/blob/main/lib/bandit/websocket/connection.ex#L119-L124


      
          # This is a bit of a subtle case, see RFC6455§7.4.1-2
          reply_code =
            case frame.code do
              code when code in 1000..1003 or code in 1007..1011 or code > 2999 -> 1000
              _code -> 1002
            end

Are these codes a de facto standard or there’s a spec somewhere?

Edit: Nevermind, they are here WebSocket Protocol Registries

sucipto · June 15, 2023, 3:21am

Default from fresh project, I can reproduce with new project (1.7.3)

Create project
mix phx.gen.auth (liveview)
fly launch
fly deploy

Git repo: https://codeberg.org/sucipto/phxfly
Deployed app: https://phxfly.fly.dev/

Browser: Chrome, Firefox
OS: macOS 13.2.1

sucipto · June 15, 2023, 8:22am

apparently caused by kill_signal="SIGTERM" on fly.toml (generated by fly launch command), cause websocket disconnect with 1000.

I changed to SIGINT now disconnect with 1006 and form restored on reconnect after deploy.

chrismccord · June 15, 2023, 11:21am

Nice catch! I’ll see what changed about our generated fly.toml for Phoenix apps and fix flyctl . Explains why my apps which launched months ago work fine

chrismccord · June 15, 2023, 4:01pm

So it turns out this was Phoenix’s fault. Our websocket drainer which we added recently was sending 1000 close code instead of 1012. I will have a new phoenix out shortly which fixes this, and you can keep the SIGETERM in your fly.toml, which is what you want for the VM to shutdown gracefully. Thanks for the heads up on this!

chrismccord · June 15, 2023, 4:28pm

Phoenix 1.7.4 is out

sucipto · June 16, 2023, 2:25am

Thank you, upgraded to latest phoenix and revert to SIGTERM works as expected.

Just curious, on firefox dev tool it’s say closed with 1011 code instead of 1012, this is expected behavior?

Log:

2023-06-16T03:20:03.169 runner[918555dc2e1783] sin [info] Pulling container image registry.fly.io/flyapps:deployment-01H313E704H5N25YXP804RADX0

2023-06-16T03:20:04.448 runner[918555dc2e1783] sin [info] Successfully prepared image registry.fly.io/flyapps:deployment-01H313E704H5N25YXP804RADX0 (1.279319178s)

2023-06-16T03:20:05.051 runner[918555dc2e1783] sin [info] Configuring firecracker

2023-06-16T03:20:05.559 app[918555dc2e1783] sin [info] 03:20:05.558 [notice] SIGTERM received - shutting down

2023-06-16T03:20:05.559 app[918555dc2e1783] sin [info] 03:20:05.559 [info] Shutting down 2 sockets in 1 rounds of 2000ms

2023-06-16T03:20:05.560 app[918555dc2e1783] sin [info] 03:20:05.559 [error] Ranch listener AppWebEndpoint.HTTP had connection process started with :cowboy_clear:start_link/4 at #PID<0.2148.0> exit with reason: {:function_clause, [{WebSockAdapter.CowboyAdapter, :handle_reply, [{:stop, {:shutdown, :draining}, {1012, 'restart'}, {%{channels: %{"lv:phx-F2kBYNv-puqEugKR" => {#PID<0.2150.0>, #Reference<0.968159881.784596993.155580>, :joined}}, channels_inverse: %{#PID<0.2150.0> => {"lv:phx-F2kBYNv-puqEugKR", "136"}}}, %Phoenix.Socket{assigns: %{}, channel: nil, channel_pid: nil, endpoint: AppWebEndpoint, handler: Phoenix.LiveView.Socket, id: "users_sessions:L4-f_1BRpsJ6pmhInmZhtj8UL8JByVptJcHlMz_i6Ic=", joined: false, join_ref: nil, private: %{connect_info: %{session: %{"_csrf_token" => "dgZloPlhryZR4zq95U55mqWP", "live_socket_id" => "users_sessions:L4-f_1BRpsJ6pmhInmZhtj8UL8JByVptJcHlMz_i6Ic=", "user_token" => <<47, 143, 159, 255, 80, 81, 166, 194, 122, 166, 104, 72, 158, 102, 97, 182, 63, 20, 47, 194, 65, 201, 90, ...>>}}}, pubsub_server: App.PubSub, ref: nil, serializer: Phoenix.Socket.V2.JSONSerializer, topic: nil, transport: :websocket, transport_pid: #PID<0.2148.0>}}}, Phoenix.LiveView.Socket], []}, {:cowboy_websocket, :handler_call, 6, [file: '/app/deps/cowboy/src/cowboy_websocket.erl', line: 528]}, {:cowboy_http, :loop, 1, [file: '/app/deps/cowboy/src/cowboy_http.erl', line: 257]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}

2023-06-16T03:20:05.561 app[918555dc2e1783] sin [info] 03:20:05.560 [error] Ranch listener AppWebEndpoint.HTTP had connection process started with :cowboy_clear:start_link/4 at #PID<0.2151.0> exit with reason: {:function_clause, [{WebSockAdapter.CowboyAdapter, :handle_reply, [{:stop, {:shutdown, :draining}, {1012, 'restart'}, {%{channels: %{"lv:phx-F2kCC57jJh5EgALx" => {#PID<0.2153.0>, #Reference<0.968159881.784596993.155689>, :joined}}, channels_inverse: %{#PID<0.2153.0> => {"lv:phx-F2kCC57jJh5EgALx", "94"}}}, %Phoenix.Socket{assigns: %{}, channel: nil, channel_pid: nil, endpoint: AppWebEndpoint, handler: Phoenix.LiveView.Socket, id: "users_sessions:F-7iGvyvm7XbSni-gebk-wB6cYlx-EkocEjKrGtWXH4=", joined: false, join_ref: nil, private: %{connect_info: %{session: %{"_csrf_token" => "3qRo1sjM57rzMteuB8U96MSW", "live_socket_id" => "users_sessions:F-7iGvyvm7XbSni-gebk-wB6cYlx-EkocEjKrGtWXH4=", "user_token" => <<23, 238, 226, 26, 252, 175, 155, 181, 219, 74, 120, 190, 129, 230, 228, 251, 0, 122, 113, 137, 113, 248, 73, ...>>}}}, pubsub_server: App.PubSub, ref: nil, serializer: Phoenix.Socket.V2.JSONSerializer, topic: nil, transport: :websocket, transport_pid: #PID<0.2151.0>}}}, Phoenix.LiveView.Socket], []}, {:cowboy_websocket, :handler_call, 6, [file: '/app/deps/cowboy/src/cowboy_websocket.erl', line: 528]}, {:cowboy_http, :loop, 1, [file: '/app/deps/cowboy/src/cowboy_http.erl', line: 257]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}

mtrudel · June 16, 2023, 2:51pm

This was actually a tad tricky to thread the needle on. Note that this block is meant to determine how the server responds to a client close frame (ie: frame.code here is sent by the client). Here’s the rationale (enumerating the codes registered at WebSocket Protocol Registries):

1000 through 1003 are codes that the client could validly send
1004 through 1006 are explicitly not to be sent
1007 through 1011 are codes that the client could validly send
1012 through 1014 only make sense when sent by the server
1015 is explicitly not to be sent
all other codes up to 2999 are reserved (and not yet defined in the above registry)

sucipto · June 19, 2023, 1:01pm

Update: fixed on phoenix 1.7.6

axelson · June 21, 2023, 7:49pm

When was the websocket drainer added? Was that in Phoenix 1.7? Or was it earlier? I’m trying to figure out if this is something that affected Phoenix 1.6 as well. But I’m having trouble finding the websocket drainer module.