Deploying liveview clustered behind a load balancer

BigTom · November 13, 2020, 12:48pm

Hi,

Potentially dumb question, I am not an expert on networks and load balancers.

When a Live view starts there are two network calls. The initial HTTP to the phoenix server, and then the web-socket connection. If I deploy to a 2-node cluster that is running behind a load balancer is it possible that the initial http request is routed to Node-1 and the web-socket connection is routed to Node-2?

Have I basically misunderstood how this works?
Is it a problem?
Is there a standard part of the network protocols that fixes this?
Is there a standard load balancer setting that fixes it?
Is there a non-standard load balancer hack required?
I there some Beam clustering magic that routes stuff internally?

Thanks!

Tom

hubertlepicki · November 13, 2020, 12:51pm

It’s not a problem. I have this set up in couple places and provided your load balancer supports WebSockets things just work as intended.

This is because LiveViews do not preserve the state between the initial server-side render and the second time it’s initialized after websocket connection is established. In essence, every LiveView gets initialized twice. So, unless you do some crazy things like starting locally registered name processes on one node, that are not visible on another node between these two renders, things should be just working as expected and that’s precisely my production experience.

benwilson512 · November 13, 2020, 3:32pm

@hubertlepicki’s answer is perfect, so I’ll just weigh in to say: We do precisely this (liveview behind a load balancer) and it works just great. There is no need at all for the static render and the live render to happen on the same server, and no hacks are required to make that OK.

wtd423 · November 14, 2020, 7:29am

Not load balancer specific, but what if your live view page lists 10 most recent news. Between server-render and websocket connection initialize the list of news has changed. Does the page content change for the user?

Is there any way to avoid it? Like somehow transmitting the most recent news datetime from rendered HTML to liveview so the server knows to respond with the “old” list of news?

LostKobrakai · November 14, 2020, 7:37am

You could store the datetime of the initial request in the liveview session and if the second connection happens in a certain timeframe use it to limit the results by. But this will only help for new items, not e.g. updates to existing items. But be aware that a reconnect half an hour down the line will start the exact same way, so you‘ll likely want to fall back to fresh data after some threshold.

ityonemo · November 14, 2020, 9:40am

Only wanted to add that this is part of the consideration around how mount/3 gets called twice. First call might not be on the same node as the second.

tfwright · February 26, 2022, 8:52pm

I’m currently having issues with my live view (v 17.7) that I am pretty confident is related to a clustered deployment, because it only started happening after adding the second node, although I haven’t isolated the exact cause yet. I’m using live_session and essentially what I’m seeing is that live_redirects will sporadically trigger this error in the browser cosole:

error: unauthorized live_redirect. Falling back to page request - Object { reason: "unauthorized" }

As if the link was pointing to a different live_session. Although it definitely is not. This triggers the page to reload, which sporadically fixes the issue. The issue will also resolve just by reloading the page several times until the behavior corrects itself which certainly feels like a reflection of which node it’s connecting to. This is not happening when I deploy the app in the same environment config to a different server with only one instance.

I think I’ve traced the error to this check, specifically the “session_vsn”, which appears to be set to the system time here. But this definitely seems like it would be different if each node was compiled separately, which seems like it would necessarily be the case unless the same release is used to deploy each node, so maybe I am still missing something here.

@benwilson512 would you be willing to share any details about how you are deploying your cluster that might be relevant to this issue?