Production Phoenix LiveView on cloud-agnostic Kubernetes

For context, my quest was and remains to design, implement and self-host a new live-view app in such a way that whether it grows slowly or rapidly I can scale both horizontally and vertically using whatever cloud or bare metal resources the app needs and the revenue it should generate would pay for.

I’ll happily elaborate on my journey on request, but it’s a long and complicated tale of many mistakes and aha-moments. With some notable exceptions there are ample tutorials, how-to’s and developer documentation out there to cover the details of what I’ve tried and failed with as well as what I ended up getting success with.

In this discussion I’m hoping to maintain a mile-high, inch-deep perspective on the viability of various approaches, platforms and packages required for a fully operational phoenix app exposed to the world, the mesh of interrelated challenges, solutions, opportunities and risks that comes with the territory.

In short, let’s discuss what are the right things to do rather than how to get them done.

Though I started out chasing self-hosted OpenStack, then Charmed Kubernetes using juju and MAAS with Ceph for storage management, I recently found that all of the elements of Charmed Kubernetes that I figured on actually using was available to me by running microk8s in clustered mode. my pfSense firewalls allowed me to set up the Calico CNI with MetalLB in BGP mode. The database is a “production grade” PostgreSQL HA cluster run by Percona’s version of CrunchyData’s Postgres Operator and the entire cluster is monitored using Percona Management and Monitoring (PMM) running in an off-cluster docker container.

It’s all working very nicely, including (eventually) getting cert-manager to work properly to issue and update Let’s Encrypt certificates for the site. But all the how-to’s and documentation on cert-manager had focussed on using it in conjunction with an Ingress controller such as nginx-ingress.

I really struggled to get cert-manager to work for me, and in the end I discovered that the version of it that ships with microk8s is quite old and buggy, so I had to upgrade that myself in the end. But that leg of the journey, going through many different guides, examples and how-to’s made me realise that the generic ingress-controller concept in Kubernetes has a massive functional overlap with what standard phoenix apps using cowboy does anyway. Especially if you’ve already using a load balancer such as MetalLB (or whatever the cloud provider offers) I started to suspect that I could cut out nginx-ingress entirely. That’s what I did and with very pleasing results I daresay. Perhaps someone, or some event, will make me regret that choice, but so far so good. To get that done though I had to dig into parts of cert-manager the how-to guides don’t cover and deploy a DNS01 solver with a delegate zone hosted at AWS Route53 and mounting the certificate as a volume-mount in the manifest for the certificate and key files to become available to the pod.

1 Like

My recommendation is to take a step back from hype and how things are done with concurrently inferior technologies like python, js and to get more into the basics.

I’m assuming this discussion is about small to medium size products with a max of 100k concurrent users, so my assumptions will be based on that.

Elixir/erlang scales insanely well on a single instance, the reasons being:

  1. All the data passed between concurrent constructs is immutable, so there is an almost zero chance of memory corruption, shared memory errors that are hard to track;
  2. The VM is fault tolerant, allowing it to have isolated breaking parts that will not take the entire VM down should something happen, so you don’t have to create physically isolated services for multiple small features;
  3. The VM has some great observability tools to track the state of the VM at runtime, so you don’t have to depend on third-party tools like k8;
  4. The way the concurrency is designed in OTP, you are guaranteed that processing will distribute computing on all cores and not throttle separate subsystems.

I have never used k8’s on a production project however from what I see what it tries to achieve is the following:

  1. Fix the issue of technologies that scale poorly like js, python by creating multiple instances of that server/product;
  2. Dynamic scaling when demand is rising - the value of this feature is very questionable taking in consideration the pricing of using orchestrator vs baremetal and it is very product dependent;
  3. Have some limited fault tolerance offered by the orchestrator that will log and restart containers should they die, needless to say this is ages more inferior to OTP;
  4. Offer redudancy on multiple locations should something go wrong, let’s say a datacenter goes down and your product is mission critical - this is in my opinion the feature you need an orchestrator for and it is tricky to achieve with baremetal by default.

If we look at all the points above, it is clear that there is little to gain from using an orchestrator like k8 with elixir, and on small to medium projects I would use it only for redundancy, which is rarely a hard requirement.

2 Likes

My principle consideration had been to avoid reliance on single-node performance, i.e. vertical scaling by designing and implementing a distributed application I can deploy in local points of presence I can rent or buy anywhere in the world where the traffic and local regulations require me to do so. If all goes well, the audience would grow many orders of magnitude beyond a few hundred thousand concurrent users and I’m hoping that I wouldn’t have to redesign the entire architecture if and when that ramps up. So I’m starting with a tiny cluster of old and slow machines rather than one fancy server that will eventually run out of steam anyway.

1 Like

My opinion on this is that this is premature optimisation, but once again it is unclear of what kind of product we are talking about and I always opt for not using cloud services whenever possible.

1 Like

That’s valid (the premature optimisation perspective, which I’ve often spoken with teams over the years) so I can assure you that the decision wasn’t made lightly or because of irrelevant hype. I had to find a way to scale the product down to a ridiculous minimum while allowing demand to potentially escalate at the most inopportune time (when I’m supposed to be focussed on value and content as opposed to redesigning the system or trying to keep it from being overrun).

I also share your take on avoiding cloud services because cloud service providers are bent on locking people into their particular cloud. But cloud tech has intrinsic value and opportunity if you can harness it without getting locked in.

I’ve been a long-time fan of first Erlang/OTP and now Elixir and Phoenix for all the reasons you’ve quoted and more. Without those contributions to the technology landscape my life’s work would still demanded many lifetime’s worth of hard slogging to implement. But we’re not talking about a small to medium application here. It’s small now but it’s scope is to grow into a mission critical system used daily if not constantly even by billions of people. I’m well aware of the adage that one shouldn’t tackle Google sized problems unless you’re Google but I had no choice in the matter really - it had to be done, and now it is. Well, almost.

As someone deeply invested in crafting a robust, scalable live-view app, this discussion resonates with my own journey. Striving for cloud-agnostic solutions led me through a maze of trials and triumphs. From exploring OpenStack to embracing Kubernetes on microk8s, every step taught valuable lessons. It’s refreshing to see how choices align with the concept of cloud agnostic applications, as highlighted in this insightful discussion. Cutting out nginx-ingress in favor of a more streamlined approach speaks volumes about the adaptability and versatility of modern architectures. It’s a testament to the evolving landscape of tech solutions. For those intrigued by the concept of cloud agnostic applications, this discussion offers a wealth of perspective. Link to cloud agnostic concept

It means I’m not alone after all which is wonderful news. It seems we’ve independently (and at some cost) made very similar pragmatic choices motivated by similar objectives.

I suggest we semi-formally pool our objectives, work product, rationale and findings so that we’d jointly decide what to use and what to steer clear of as we respectively meet our evolving requirements.

There’s no shortage of vested interests and divergent agendas in this space, most of it clouding the waters. I’m sure if we can define and maintain a specific arrangement of technologies to focus on with explicit motivations for each choice, clear objectives to being met and add to that all those little secrets, scripts and yaml file entries which unlocks the features we require the number of people sharing the load of keeping up to date with the constant change in all the moving pieces would increase.

If you’re keen, I could start things off with a full description of my objectives, what I’ve considered, what I chose and the snippets of yaml etc required to make it work. You could then indicate where your perspective differ and why, and we can work towards one common toolset we can help each other stay current on.

I’d be in favour of making this a little more binding than what is typical in the open source community, in the sense that once we’ve agreed on a shared arrangement of products and technologies, we’d consider ourselves obliged to consider the other’s evolving requirements as much as our own when proposing future changes to the agreed set of components and techniques. It being open source others would benefit from it as well, as long as their objectives align wth ours, but if someone with slightly different objectives wishes to join and assert some influence, they’d have to negotiate with us to have their objectives incorporated into ours at the price of their commitment to also consider our objectives and share the load of keeping the emerging standard updated.

How keen would you be on something like that?