LiveView deployment options in production

sabiwara · August 5, 2020, 7:42am

Hi

Sorry in advance for the noob question, but I am struggling to figure out what would be the (best) options to deploy LiveView in a production-grade setup (if such a thing is possible given it is still pre-1.0), and what are the tradeoffs involved.

From what I could gather, LV is based on channels and using pubsub under the hood, so it would imply either:

Run on a single node
Run distributed elixir
Use the Phoenix.PubSub.Redis adapter

Is this statement correct? If yes, I am a bit unsure about which of these options to consider:

From what I read, scaling vertically and just run a single node on a bigger box can perform quite well and be a pragmatic approach for a small/early-stage business, but it feels a bit wrong from a devops perspective to commit to a setup that is not horizontally scalable for a more advanced production use case?
Limits the deployment options (e.g. rules out Heroku), and seems not trivial to get right, but there are tutorials for AWS, k8s & Gigalixir, so I suppose it is manageable?
This adds some extra moving parts and I suppose it introduces some performance overhead as well?

I want to rule out 1. here, so I’m especially interested in the tradeoffs between 2. and 3., since I have no idea which would be the simplest/most reliable/easiest to maintain in practice in a real production setup.

Thanks in advance and forgive me if my understanding/question is off

PS: I fully understand that LV itself might not be totally production-ready yet, but I read that it is getting closer to it and that it is being used in production already. So I am really interested to know what the setup would look like, especially given the huge productivity boost it offers for apps that don’t justify a full-blown SPA

LostKobrakai · August 5, 2020, 7:50am

You’re correct that LV does use channels and therefore is integrated with pubsub. But if you actually need joined pubsub between your nodes depends on what data you want or need to transfer in a cluster wide fashion. Liveviews as well as channels start out just being a process on a node representing a connected client. It’s the type of communication you handle with those clients, which makes you depend on the nodes being joined. Liveviews without communication to other clients/systems should work without that.

lalo2302 · August 5, 2020, 11:48am

Also you can checkout render.com, it looks like a mixed version of heroku and gigalixir. It’s very easy to deploy 1 node, and the clustering looks straightforward.

Then if it starts getting expensive you can always move to your own servers.

I’ve being using it for a pet project and so far so good.

sabiwara · August 5, 2020, 1:01pm

Thanks a lot for your answer! My clients should be handled in isolation, I have no plan to explicitly use any broadcasting, or access any shared state, so I suppose there is no need then.

But I was also under the impression that clustering was solving the problem of load balancing websockets without resorting to sticky sessions (https://blog.gigalixir.com/5-reasons-we-love-elixir-clustering/). Would this be a good reason to go with distributed elixir, or is it wiser to handle this issue on the load balancer level? My experience so far being mostly with “typical” stateless apps, I totally lack the perspective about the implications of both those options.

LostKobrakai · August 5, 2020, 1:10pm

Clustering does give you the possibility to move work to other nodes, but what is described in 4. of the article is not load balancing in terms of routing (http-)requests to other nodes. It’s about distributing work, not client requests.
Point 5. assumes that you have some kind of global state to share. You don’t need sticky sessions for liveviews though, as they don’t depend on any state on the server. Everything they need comes either from the client or is loaded on demand in mount and other callbacks. A disconnect on Node A will stop the liveview process anyways, so there’s no need to reconnect again to Node A.

sabiwara · August 5, 2020, 11:47pm

Thanks a lot for your response, this really helped clarifying the picture and clearing the confusion!

It seems that using LiveView has a lot less implications than I was initially imagining. Knowing that each LiveView or component runs in its own stateful process seemed “scary” at first from a deployment perspective.

I was also initially worried about how seamless daily deployments could be possible without disruption to the user, but it seems that everything is being handled on the client side (reconnecting to the right channels, stashing and recovering forms…). So I suppose LiveView stays compatible with a “cattle not pets” vision where servers are replaceable at will as well (provided there is no need for a persistent state like the game server example, oc).

If LiveView does not require any specific deployment strategy compared to a typical REST API app, this is really good news and a huge relief. Maybe it could be good to emphasize this more (in the docs?) to help adoption for people like me struggling with the paradigm shift?

sabiwara · August 5, 2020, 11:48pm

Thanks for the suggestion, it was already on my radar but I will definitively check it!

josevalim · August 6, 2020, 6:19am

To clarify, LV does not require PubSub, unless you explicitly use PubSub features.

sabiwara · August 6, 2020, 7:55am

Thank you Jose, it is much clearer now

cnck1387 · August 7, 2020, 1:53pm

I’m pretty sure in a lot of cases users will still notice disruption on a live route when they typically wouldn’t with a non-LV site, but now the question becomes whether or not it’s impactful disruption.

For example, imagine you’re on a page like these forums and you have a long post sitting there on the page. Assume this is a single server deploy. Without LV, if the server went down for 3 seconds to deploy a new version of the web app while you were reading it you won’t notice anything. From your POV there was no down time and nothing looked out of the ordinary.

With LV, the websocket connection is going to drop and I believe that’ll trigger a loading progress bar until it reconnects. So now the end user will see that. In this case, it likely doesn’t do any harm but it’s still exposing every second of down time.

One could argue this is a benefit in theory because it’s a more up to date representation of your server which allows you to see what you can and can’t do in real time but I think in practice, there’s a lot of value of people assuming everything is good to go so that in 99.9% of the cases they won’t notice it.

It’s sort of like game development. There’s so many tricks, shortcuts and crazy things being done to hide imperfection but end users never know (and this is great).

I wonder if it will be possible with LV to not trigger a visual notification that the connection is dropped, while still having live feedback on actions that cause it when the server is up (such as having disabled states and progress loaders on various form events, etc.).