We’re building an app that has been growing quickly, and generally enjoy the benefits of LiveView.
One area that has started to become a struggle as we grow is deploying new servers. We have a kubernetes cluster that performs rolling restarts for us, but it can result in the equivalent of every active user reloading their page all at the same time, which can give heavy bursts of traffic and put stress on our DB.
Wondering what others have done to mitigate this, specifically as it relates to LiveView? Any good approaches to staggering client reconnects a bit, for example?
LiveView builds on top of Phoenix Channels, so if you want to work out a solution from the client side have a look at phoenix 1.7.14 | Documentation. You can configure several aspects of how the Socket client works, for example the opts.reconnectAfterMs function could be use to add jitter on top of the default exponential backoff.
I’d note however that like on the thread I linked above, we should probably be thinking at the overall system level – if you’re doing rolling deployments with Kubernetes, only a fraction of your currently connected users are being disconnected and then reconnected at a time, not all of them.
I solved a problem very similar to this at my previous company.
In that case, the DB stress was happening due to “live updates”.
We were live-updating a very complicated and dense thing. A single action could result in many changes. At first, we put the whole thing in the PubSub payload for live views to update their own assigns. This had problems. We learned that large payloads being sent long distances take a long time. So then we simply had all the LiveViews hit the DB to get fresh data. This led to “stress on the DB”.
To solve the problem, I cached the result of the function that the LiveView used to fetch data. I used the :nebulex library to do that.
The only problem is it takes great care and attention to remember to “clear” the cache every time the data changes.
When we got it set up properly, it worked well and we did not see any more live-updating issues.
I believe something like this could help you, at least with the “DB stress” side of things.
We faced a somewhat similar problem trying to perform a rolling upgrade (we are not using LiveView). Our application is distributed in nature (i.e. Erlang distribution) making such an upgrade hard, although our k8s / AWS load-balancer configuration supports it. Currently we simply wait for the new cluster to start in its entirety before switching traffic (Kubernetes maxSurge set to 100%). Effectively we have an instantaneous switch of traffic rather than a rolling upgrade.
What we are planning to do is set the cookie field in the mix.exsreleases section to be the current release version. This will allow us to set maxSurge to a lower value, say 25%, so traffic sent to nodes in the new cluster and existing traffic on the old cluster will not interfere with each other.