How to do/manage inter-cluster communication?

MarthinL · July 3, 2024, 4:15pm

Folks,

My Phoenix LiveView app is meant to scale horisontally and keep access latency down by running as distributed across as many Kubernetes clusters as needs be. So far load balancing at the Kubernetes level has been enough but curtesy of libcluster I’m ready to jump in with tweaks to make sure LiveView sessions gets back to the same node if possible.

But when it concerns how my regional clusters should contact each other there’s a whole array of ostensibly competing concepts and tools to consider and it’s very confusing. Some of the terms, packages and complications in this space are:

Service mesh tools. Istio, linkerd, meshx. What is a service mesh in an Elixir/Phoenix world anyway? I get services and microservices at the HTTP(S) API level, sure, even what José José refers to in his article about Kubernetes and the Erlang VM but in practice in an Phoenix setting how do they play out?
Network connectivity. The inside of a Kubernetes cluster, depending on the container network interface (CNI) you use makes extensive use of pretty standard networking principles and protocols to give each component their own private IP and mDNS name. The cluster only links up with “real networks” through a select few ingress and egress points which could (if that’s how you configure it) have public IPs or are reachable via public IPs through some load balancer or ingress controller. The whole (multicast DNS) namespace (and potentially the IP addressing scheme) inside the cluster could potentially be the same in all your clusters because their scope is that isolated. There are packages out there such as submariner/lighthouse that seems to take it upon themselves to provide secure (mutual TLS) networking between clusters. E.g. submariner is in early development, therefor not yet production ready even if that is indeed the way to go.
It’s also possible to simply use what comes in with Phoenix and expose a TLS secured API routed from each cluster’s public, probably load balanced IPs through which clusters ask and answer each other questions. One could probably implement some JWT token exchange to manually achieve something akin to mutual TLS.
There are also those who state that after fooliing around with the available options they ended up going with RabbitMQ for their purposes. Now I know as Erlang clan we’re got a built-in soft spot for RabbitMQ but it’s such a generalised solution that I’m genuinely worried that I’d have to adjust my intended solution to what queue types and topologies RMQ already offers as it might be too hard to work my special distributed database and processing logic into RMQ.
Apart/on top of the gaggle of people who’d suggest that using a common database as a communication method between parts of your (distributed) application there are of course also those who suggest using Redis for that purpose. I’m old enough to have practically invented that approach myself when a common database was what I had to work with. For this application I don’t have a common database and don’t intend to deploy one for starters. Each region runs its own database and I’m building the layer that turns those into some sort of federated database at the application layer. That’s what the regions are “talking to each other” about.
It’s also conceivable to rig up some MPLS or VPN-style solution completely independently of both Kubernetes and Elixir/Erlang which sets up secure connections between the networks each of the clusters consider their “external networks” so that they can connect using layer 2 without having to worry about security over public networks.

So, folks, please, talk to me. Help me make sense of my many half-options so I can settle on something that is production-ready now, won’t absorb all my cognitive capacity to set up and run, slots easily into the Phoenix and LiveView idiom.

In my ideal world I end up with a simple yet safe way to
a ) look up stored contact details for my application in a different region,
b) compile a requrest for the data I want,
c) sign and send the request off,
d) get a signed result which I’d
e.1) use and discard it (if I did a “fetch” call) or
e.2) cache and reuse it (if I made a “subscribe” call) in which case I could
f) expect a callback when the data changes, until
g) unsubscribe from changes when the data ages out of the cache.

Similarly, the same application code on the other end of the call would:
a) Get a signed request for data from another node
b) Fetch the data and
c.1) Sign and send the results to the requestor if it’s a fetch request, or
c.2) Register the requestor’s subscription to that data and then sign and send the data
d) Monitor the database layer for changes to data that has active subscriptions and
e) Compile, sign and send data updates to all subscription holders.

The plan would be to build a little magic into the way requests aggregate and updates propagate through what could end up to be a rather big network of clusters to reduce the point load on busy regions and build an effective self-healing/configuring multi-path solution, but that’s not code I’d dare implement until I have sufficient live regions to warrant the test rig it would require.

What (in my view at least) far more trivial than most of the packaged solutions try to cater for is that it’s one single application (with as many components as I want) running everywhere. All identical except for which region id they consider their own. I don’t have “foreign” applications, services, endpoints, processes or APIs to contend with. The only actual complications are network latency, capacity and needing to use routable IPs as sparingly as possible (IPv4).

P.S. For my networks I master my own DNS zones.

MarthinL · July 4, 2024, 11:01am

Let me explain the essence of what I’m unclear about.

Services had been in play ever since Scott McNealy voiced the realisation that “the network is the computer” but they’ve taken many different forms over the years, each with their own way of identifying / addressing them. From compiler/linker symbols of stub functions rigged to do remote procedure calls to versioned DLL symbols defined by COM interfaces to (mostly HTTP-based) URLs by which to reach RESTful APIs and interact with SOAP objects made popular by what I should call “the Aja years”.

Outside the eco-systems created by frameworks like Phoenix I could guess (but shouldn’t) to most people today a service roughly equates to an API at a known URL and a set of library functions to make consuming such a service less laborious than compiling HTML headers and interpreting responses by hand. It’s a gross simplification and a wild guess but for the purpose of explaining my confusion it will do. So our applications are broken up in very many single-concern components providing their specialised services to each other and the application itself is just the conductor of the orchestra. Makes sense, at least at some level.

So now we read about the meteoric rise of the service mesh concept. Most annoyingly, even the front-runners in the service mesh business answer the most pertinent question “what is a service mesh” like politicians are known to do. It’s so self-referencing that unless you already know what a service mesh is their definition of a service mesh is as vague as marketing material.

If that was the only complication it would have been OK, because the service mesh providers provide sample applications to bring home the concept. But the examples make litle sense to me because I don’t, never did or no longer make the connection between what the examples are doing and how that translates into Erlang/Elixir and Phoenix domains so I am struggling to “see” what in its very essence is happening there.

Some documentation around service mesh approaches state that they’re looking to incorporate service mesh into the network layer and proceeds to depict an eBFP layer below the socket layer implying that to one service any other service would boil down to an open socket. It doesn’t sit well with me but if that’s what it’s all about then I suppose I can adapt my understanding to that.

Other write-ups onthe matter wax lyrical about making the service mesh transparent to the developer. Now I’ve worn a great many hats in my life besides developer from hired hand to global design authority for a sizable multi-national, systems architect and even product manager, and I sat at all sides of the table including not having a seat and having been bullied out of the one I had, so I can understand why some might consider it a good idea to make something transparent to developers. But it’s the same kind of dangerous thing to actually do that as it is to think you can turn all the company’s geographically dispersed sites into a single logical network because VPN tunnels can cross the boundaries between networks safely and transparently. You can do it but don’t expect there to be any useful network capacity left for actual traffic once each LAN’s local chatter gets carried over those transparent VPN links to every other LAN in the organisation… Bad network design you say, and I agree, but it illustrates what happens when you aim for making anything transparent and then sell its virtues at the wrong level within an organisation. given how often it is mentioned that the Erlang VM’s native node mesh networking is way to “chatty” to be viable over high-latency or bandwidth sensitive iinks such as would be the case between clusters, there certainly is opportunity here to step in a similar trap.

Working in the Phoenix framework takes all of us even further away from the native problem domain addressed by service mesh technologies and inter-cluster communication tools. In practical terms Erlang, Elixir and Phoenix make its own variant of microservices its first class citizens.

At the language level a service is some process, progably a gen-server at heart our code can interact with regardless of what node it is running on simply by having its pid (which encapsulates the node id) and we can obtain pids for services we didn’t start ourselves by looking them up in a registry.

beyond what happens at language level, Phoenix also allows us to never directly touch a URL anymore. For the flavour of service that’s essentially an HTTP(S) API Phoenix’s route abstraction lets us logiocally compose the URLs we need by specifying the function we ail to call and its parameters. I’ve not needed to do much of that yet, but i imagine that in that context a service running in another cluster would again be engaged “transparently” (or unwittingly) through the public IP the DNS name resolves to, if it is reachable.

All in, our Phoenix applications already live under “management” something that looks and behaves a lot like what I can make out about service mesh toolsets except that in practical terms in is limited to a single Erlang VM or a cluster of Erlang VMs living on the same high-speed low latency network such as a LAN. It all comes very natural. We don’t expliitly write micro-services and leave it as exercise to the interestes project manager or integrator to make them work in concert. No. Everything we write in Phoenix, Elixir or Erlang are in essense micro-services our applications are orchestrating.

Except when it comes to crossing cluster boundaries, it seems.

Once we step outside the safe and happy confines of a cluster, across the public network badlands to go know on the door of another cluster’s safe and happy world all that safety and happiness goes away. And when we go looking for ways to cross the badlands safely we’re confronted with tools and techniques that evolved for much more generic purposes than extending the paradigms and idioms our applications are written in. It seems likely that much of what those tools are having to contend with haven’t been issues in our lives as espefially Phoenix programmers since we’ve stepped into this world. That’s OK, they have to sort out their issues too, I have no problem with that.

But how does any of it end up solving that (inter-cluster) part of the problem that native Elixir/Erlang/Phoenix aren’t naturally great at doing? Is it really something we need third party tools to solve? Or can we overcome the network layer generation gap resulting from the world Erlang was conceived in which assumed LAN type networking compared to the Internet we’re living on today?

Or on a more pragmatic level, what is the appropriate abstraction for a microservice written in Elixir or Erlang, specifically in the Phoenix Framework. Is it a URL, a set of routes, a named pid (probably to some proxy process), an IP, a DNS name, an open socket, or something else?

Question two is simply how would a Phoenix programmer identify and consume the services of a specific service running in a specific remote cluster?

I’m sure the right answers would be simple, especially in hind-sight, and I’d go “now what didn’t anybody just say so” and we’ll all have a good giggle about it. BUt from where I’m sitting my uncertainty about this is a major stumbling block.

D4no0 · July 4, 2024, 11:27am

Honestly, unless the regional distribution is critical, I would avoid getting into that and just go with a monolith served from a single region, there are a lot of tradeoffs when designing such systems and if you don’t understand clearly, you might end with a system that performs the same or worse but having 10x more complex infra.

There are multiple ways to achieve load-balancing/forwarding over multiple instances, but it’s not that easy and it will involve a lot of plumbing and to make sure you truly understand the benefits and downsides for the specific approaches.

The easiest one is to place a load balancer at your public endpoint(the base URL that all requests will point to) that knows how to forward traffic to the closest instance. This is pretty easy to setup, at least as long as a cloud provider provides you a VPN where you can connect all these points together, otherwise you will have to setup that yourself.

Now all of this is nice, but this doesn’t solve the problem of data, you want all your services to point to same source of truth, this means either you once again use your cloud provider to share a database between services(which BTW might not make sense at all since you might have a huge network penalty) or you do that manually via things like replication + localized read only databases, not trivial stuff even for people that do these things for years.

This is just to get running a few clean instances and a database, but once you start to plug multiple things like monitoring this becomes more complex really fast.

PS: If you are starting with elixir and/or OTP I would recommend to stay away from using inter-node communication, as that opens a whole new can of worms that can blow in your face anytime.

MarthinL · July 4, 2024, 2:01pm

Honestly, it is critical and justified.

There is no getting around latency, i.e. if your users are spread across the globe there isn’t a single place on it to locate your monolith that is suitably close to all of them.

Load balancing was easy yes, within a cluster. You’re contradicting yourself a little by referring to the closest instance while advising against a geographically distributed design, but if we ignore that detail then getting users to the closest point of presence is also easily addressed using a GeoDNS service. None of that touches on the challenge I face.

Note, I don’t have a cloud provider to defer the difficult part of setting up a geographically distributed database to. I’m sworn to remain cloud agnostic so I’d be free to choose the most opportune and cost effective ways to roll out more regional points of presence as growth dictates. For that I am developing on bare metal Kubernetes with PostgreSQL running in the cluster as well and I am handling remote data access within the application itself. I’ve designed the database from the ground up to facilitate my particular version of distributed database. None of this was undertaken lightly. I’ve conceived of and successfully implemented several ground breaking systems in the course of my career spanning many decades so I’m neither a rookie nor am I under any illusions as to the nature of what I’ve taken upon myself to do accomplish here. I’m sure you mean well but if you question the extent of my background it’s likely because you can’t recognise it from how I’ve phrased my questions.

I see your fear and you’re most welcome to it. Just don’t try to make it mine. I appreciate your intent to keep me from getting into trouble you’d be unable to help dig me out of. Very noble of you. But rather than focus on the scary stuff you don’t understand either, why don’t you try answering my questions instead? I’m going to do it anyway and I’m already as deep in trouble as I am going to get and overwhelmed by options. Neither of us can change that, but I can use some help sifting through the noise surrounding service mesh packages in a Phoenix-friendly way.

P.S. I believe I have expressed an understanding that raw Erlang/Elixir inter-node communication is unsuitable for inter-cluster use case. But if you mean that it’s also unsuitable for inter-node communication or that multi-node OTP is too advanced for someone who’s worked with Erlang since the late ‘90s you might be barking up the wrong tree if not at your own shadow here.

Now, with that said. What are the services that feature in service mesh packages in Elixir/Phoenix terms and given that what is a service mesh, also in terms of Erlang/Elixir/Phoenix? Do we need it? Why? Why not? How do we use it or what do we use in its place? The objective remains to call upon a remote instance of my app to do something I can’t do myself which in this case means to retrieve data it is master for that I need a copy of. That is the understanding and help I need now.

MarthinL · July 4, 2024, 3:34pm

I snuck a peak into the sample application one of the service mesh packages (linkerd) provides as a starting point for new users. Take a look for yourself if you want. All I can say is that even as (or especially as) someone that used to speak C better than English in my youth and still use it when I really need tight code, I couldn’t imagine writing entire systems the way the sample code suggests. Not any more. Not with languages like Erlang and Elixir around and not when frameworks like Phoenix with LiveView is as accessible to everyone as it is. Makes no sense.

What it does mean is that the likelihood of me concluding that service mesh technologies solves a problem I’m experiencing is shrinking fast. I’m sure it solves or at least addresses a problem else it would not get any attention at all, but it does not address a problem Phoenix leaves us with.

Thank you once again @josevalim, @chrismccord as well as Joe Armstrong and the old team at Ericsson. Your brain-children has enriched our lives and the world. Now if we can just figure out the correct way to call on an instance of my application running on a different cluster in ideomatic Phoenix or Erlang (or even Erlang) it will be plain sailing from there. I suspect it would boil down to having some local proxy for each remote cluster which establishes (pools of) connections over TLS (or mTLS?) which is registered locally in the name agreed name of the remote cluster. That would contain the chattiness of IPC locally but safely reach out across the badlands to the remore node.

Unless I’m missing something obvious there doesn’t seem to be anything simple like that yet. Meshx is an Elixir project but for reasons I’m not quite seeing it appears to derive its functionality from an externakly defined general purpose service mesh rather than directly addressing the needs of Phoenix and Elixir programmers.

I confess that especially from my single application perspective I don’t currently see the value proposition of service discovery either. My app (as a chatty collection of micro-services of the Elixir variety) only talks to its own services and knows all about what it needs and what it offers. The only question is how to reach those services if they’re running on remote clusters. But then, I’m old school, preferring things explicit, deterministic and structured. We used to call it programming on purpose.

MarthinL · July 5, 2024, 12:13am

Partisan popped (back) onto my radar. first time I saw it reference to it was an ancient post that ended in a “this domain is for sale” page so I though it was stil-born or got swallowed up
And morphed into something else. It so. Partisan is very much alive and it’s
Not an issue that it’s purely an Erlang library.

I’ve barely scratched the surface but by the initial indicators it makes the right noises and acknowledge the right shortcoming in the distributed Erlang approach when multiple clusters are involved. I’ll report back what I find once I understand more about it.

Exadra37 · July 5, 2024, 7:30am

Take a look to GitHub - alfetahe/process-hub: Distributed processes manager and global process registry to see if it can help with your use case.

MarthinL · July 5, 2024, 9:44am

Thank you, it looks good but remains by its own objectives and feature descriptions squarely focussed on a single cluster of nodes with no obvious allowances being made for when those nodes are spread far apart and eventually too plentiful for a full mesh to be realistic. It might have to get merged with something like Partian with an opportunistic custom topology based on application-specific conditions.

If Submariner had been mature it might have been a useful companion to help fill in the inter-cluster blanks for taking ProcessHub into the multi-cluster world.

The part of my use case that still befuddles me is how to address the large overlap between Erlang/Elixir and Kubernetes. It was child’s play to get my application to run and share load orchestrated by K8s but inside those clusters the BEAM nodes are even more isolated from the world out there and neither K8s nor the Erlang domains are particularly adept at running one app on a global set of weakly linked clusters. There’s a void and inevitably efforts to fill the void in both worlds (though more so outside the BEAM domain) and what’s most hard to do right now is to develop and apply a useful mapping between the concepts inside the Erlang/Elixir/Phoenix domain, concepts in Kubernetes and Cloud Native Computing and concepts in raw micro-service architectures where techniques such as service mesh finds its void to fill.

I don’t and never did expect this to be easy but I have a job to get done and can neither afford to waste energy reinventing existing wheels not saddle my application with the technical debt of having been built with wheels that are at odds with each other. For me that means having to really understand what I’m doing and working with so I can abstract it down to trivially simple facility (encapsulating a lot of complexity behind the scenes) at the overall application design level.

Given that I already have my own global registry keeping track of the regions (clusters) my app is running in, the most promising abstraction appears to have each region resolve to a global process name (which in this case means global to the node but local to the cluster) of a managed process running within the cluster that relays standard messages between regions. The regions are logically meshed as in any can contact any, but the relay internally manages which are actually directly connected and which are require multiple hops dynamically like one would manage a cache or a pool, i.e, based on demand and available resources.

That’s my best guess at the moment but it lives purely in the Erlang/Elixir idiom. I’m yet to factor into that if and how K8s and service mesh tools and networking features in making it so in the most appt way. For that I’m still desperately seeking insights into how best to correlate the various concepts of these solution domains.