I really like the deployment model of having a load balancer with two (or more) application servers behind it. The reasons I like it are:
SSL termination on the load balancer
Ability to update / restart one server at a time without affecting the user experience (in a stateless world), assuming the remaining server(s) have enough spare capacity.
Some degree of high availability
However, this seems to get more complex when entering the websocket world, with its more persistent connections/state. This article ("The unsolved load balancing problem of websockets") makes it sound a bit like territory that is not all that well charted. At the same time, it sounds if one followed the practice of only restarting one server at a time, it would work out ok?
Any thoughts how unsolved / tricky this really is?
I also just realized that my copy of “Real-Time Phoenix” from @sb8244 has a section outlining several strategies for dealing with loadbalancing websockets. Will study these in detail now also.
If I remember you can configure your loadbalancing to always send a client to the same instance of your backend. If this instance crash then the LB will redirect to another one and follow a simple (re)connection flow.
However, keep in mind that you can connect your elixir instances to each other. So you can share the state of a websocket and then create a special flow for reconnection.
@fxn: The AWS load balancing algorithm looks great, and like a really simple solution! If I’m not mistaken this algorithm is also called “least connections” with other load balancer offerings (haproxy being one them).
Not sure about AWS, but with some of the load balancer solutions out there one might have to additionally turn on “sticky sessions”, I think. EDIT: Hmm, it looks like “sticky sessions” are an option available only for http (but not tcp), I need to investigate some more.
Very exciting thought that there might be a relatively simple path here after all
Just wanted to update my last post (can’t edit it any longer), and delete the section about “sticky sessions”, which is not relevant / correct.
From my latest understanding the only thing one really needs to do is to use the “least connections” balancing algorithm (or equivalent) to balance websockets in a reasonable way across multiple servers.
WebSocket connection still starts as http connection and is upgraded to WebSocket connection so I don’t see why you couldn’t use sticky sessions (implemented with cookies) with WebSockets.
just keep in mind {:global, “name”} takes out a transaction lock on the entire VM, so if a lot of things are in motion it can block, and performance degrades dramatically if you have a cluster (or so I hear, I haven’t used it yet).
For any action resulting in a change to the global name table, all tables on other nodes are automatically updated.
Global locks have lock identities and are set on a specific resource. For example, the specified resource can be a pid. When a global lock is set, access to the locked resource is denied for all resources other than the lock requester.
other bits and pieces:
When new nodes are added to the network, they are informed of the globally registered names that already exist…
This function is completely synchronous, that is, when this function returns, the name is either registered on all nodes or none.
I guess they don’t come out and directly say that there are transactional locks, but I’m pretty sure they aren’t implementing a faster/more available consensus protocol like raft (which wouldn’t have those properties anyways).
I like the ALB strategy and would definitely look at that if it’s available. We don’t really have that capability, so had to adapt a bit.
We check the whole cluster every 15s or so and query the number of connections. If one node is outside of a statistical range, we terminate a small number of connections. This causes it to load balance within a few minutes (Using round robin). Only one node will terminate connections, which keeps the balancing steady and as slow as we want.