Is an app with 3000 Microservices a fit for Elixir/OTP over Kubernetes/GRPC

Unless I’m misunderstanding your terminology, user applications can include “infrastructure” applications . It’s pretty standard to have multiple user applications running together on the same node.

Same within the same node is definitely simpler, and should be the first approach to consider.

Multiple nodes on the same machine is basically mainstream microservices (multiple OS processes), and it might make sense in some special situations. Two cases that come to mind:

  1. Isolated deploy without the hassle of code upgrades. Split the system in two (or more) nodes, and then you can deploy/restart them separately, while other nodes are running.
  2. Isolated failures in the case of a NIF or port driver. If you’re running in-process custom native code, then moving that code (and a minimal Elixir/Erlang code surrounding it) to a separate node ensures that e.g. a segfault in nif won’t take down the entire system.

In both cases I wouldn’t use distributed Erlang, but rather have nodes communicate via e.g. HTTP.

1 Like

Newcomers to Elixir/OTP have a certain preconceived notion of what an application is - and it isn’t anything close to the way OTP uses the term which makes effective communication extremely difficult and is a setup for all sorts of misunderstandings and false expectations.

When I use the term “user application” I’m trying to zero in on an entity which implements the Application callbacks and therefore initiates the creation of it’s own supervisor tree - while at the same time not being part of the Erlang/Elixir runtime system. As has been mentioned :kernel, and :elixir are technically primary applications as they have their own independent supervisor tree but …

… if you talk to someone from JVM/CLR land they’ll agree that the runtime is essential for running of their application but they’ll give you a rather stupefied look when you start referring to the runtime as an “application”.

One of the worst offenders in this situation is the notion of a library application - how would people take it if we referred to React as an application? Yet it is the notion of the library application that gives people all sorts of initial expectations when it comes to umbrella projects - because it gives the impression of a workspace for developing multiple applications (each with it’s own supervision tree) when in fact it is usually just used to segregate the functionality that will ultimately run in one single supervisor tree (and therefore is confined to execute on one single node and therefore CPU - if I truly have multiple applications then I would expect that I can put each application on a separate node - which of course can’t be done with a library application).

It’s pretty standard to have multiple user applications running together on the same node.

Where? I’m not saying it can’t be done - clearly it can - but the typical scenario that seems to be presented is multiple applications (each with their own supervisor tree) that comprise the runtime system plus some optional environmental services (altogether comprising the “infrastructure”) simply to support the one, single application (with it’s own singular supervisor tree) that we are actually interested in running.

1 Like

I am confused. What does CPU mean in this context? A single node will create a scheduler for every core of every CPU, so a single node will happily take over all your cores and CPUs, and work will be done on all of them.

Joe Armstrong has said that Application was really a misnomer, and Component should have been used instead.

With that in mind, a library application just becomes a bunch of functions. You need to make it into an application in order to bundle them in a release, that’s all.

1 Like

Interesting, because at this point I’m more inclined to correlate a primary application to a microservice (especially given the capability to run multiples on a single node) but I suspect your choice is motivated by the enhanced isolation that running each primary application on a separate node gives.

  1. Isolated deploy without the hassle of code upgrades. Split the system in two (or more) nodes, and then you can deploy/restart them separately, while other nodes are running.

I think this scenario is highly relevant to the original post if in fact hot code reloading (upgrades) for source handlers is seriously considered.

In both cases I wouldn’t use distributed Erlang, but rather have nodes communicate via e.g. HTTP.

This genuinely surprises me - but I guess it’s the isolation argument again, which suggests to me that in your experience messages from a rogue node can wreak havoc on an otherwise healthy cluster.

But while HTTP is ubiquitous I’ve noticed also a growing number of cautions towards using HTTP between microservices because of the inherent (un)marshalling cost which really doesn’t benefit the microservices.

The issue is that a primary application is limited by the computing power of the CPU it executes on - it has no capability to scale horizontally - scaling vertically means you need to put it on a CPU with more cores or more powerful cores. To scale horizontally you have to rearchitect the primary application into collaborating parts, capable of running on separate nodes (on separate CPUs).

This is roughly equivalent to determining that your microservice isn’t so “micro” anymore and needs to be broken up so that the various parts can run on separate machines.

Given the number of services mentioned (3000), I’d likely take my chances with a single node and code reloading :slight_smile:

Nope, it’s something else. I feel that distributed Erlang should be used only to power the same code, i.e. multiple instances of the same “thing” which are connected into a cluster. Adding different types of systems (using different OTP apps and having a different process structure) might cause various problems with distributed parts of the code (e.g. pg2, Phoenix PubSub or Phoenix Tracker).

1 Like

What I was thinking is that the node for the services should be separate from the node running the Phoenix Web App (or anything else for that matter) - if only to isolate the code that needs to be capable of hot code reloading - and keeping that corner as simple as possible.

Nope, it’s something else. I feel that distributed Erlang should be used only to power the same code

Hmmm, that’s a bit … disillusioning.

Oh, yeah, I agree with that :slight_smile:

I just want to ask one thing. Does one Erlang/Elixir application with its own supervision tree can be executed on 2 or more physical CPU’s (not cores) or is it limited to one CPU and its cores? If for example, i have a server with 2 cpus 16 cores each, and an application that takes some input through… say HTTP (it’s actually irrelevant here I think), and spawn workers under one supervisor tree to do some heavy CPU work, will it work on all 32 cores, or just one CPU with its 16 cores?

My understanding is that it will be bound to one single CPU and all its cores as the node is running within a single OS process - depending on the OS you may be able to set the processor affinity for the node’s OS process - in which case you could target the second processor with a second node.

By default Erlang will start a scheduler for each core on each CPU, assuming SMP is available and not disabled. In your example Erlang would start 32 schedulers.

The BEAM Book is a pretty good source of info and is a good read (not that I have read it all). The chapter on Scheduling explains this in pretty good detail.

The wealth of information contained in this book is amazing. A big THANK YOU to all the contributors! :smile:

In your example Erlang would start 32 schedulers.

Depends on the cores. On my quad core i7 8 schedulers are started as hyper-threading presents them as 8 “logical processors” (so really I should override it down to 4).

Also according to Inside the Erlang VM with focus on SMP (2008):

The schedulers in the Erlang VM are run on one OS-thread each and it is the OS that decides if the threads are executed on different Cores. Normally the OS will do this just fine and will also keep the thread on the same Core throughout the execution.

So the BEAM has actually no control over whether two or more schedulers are being run on the same physical CPU core (and conversely whether it is utilizing all available CPU cores) - thats entirely up to the OS and how it’s configured (for thread or processor affinity).

I am fairly certain this is wrong. Do you have a source? My understanding is that Erlang spawns one OS thread per scheduler, and by default the OS can assign them to any « logical » CPU, that is any core of any physical CPU.

So a single BEAM process will happily take over an entire physical machine with 1,2,4,8 CPUs.

The source cited is fairly old. The current BEAM can bind schedulers on specific cores (on supported OSes), so that performance can be fine tuned.

A more updated presentation is here, from 2009.

Thank you all for explanation :slight_smile:

It scales out to all fairly equally. When my big elixir project gets loaded it scales out to all 16 cores across the 4 CPU’s of my server quite equally. ^.^

The post that immediately followed pointed out that my “understanding” at that point in time was wrong.

So a single BEAM process will happily take over an entire physical machine with 1,2,4,8 CPUs.

Apparently you use “BEAM process” to refer to the OS process that is running the Erlang VM - in the past I’ve always used “BEAM process” as a means to refer to a process created “inside the BEAM” in order to differentiate it from an OS process. I’ve also noted that in literature there is a certain lack of consistency when it comes to the term CPU - which leads to a situation where terms like molecule/atom/quantum are treated as if they are interchangeable (which they are not) which doesn’t help to clarify things.

  • CPU: I use this to refer to what these days is typically found on a single LGA/PGA/BGA - Erlang simply refers to this a “processor” (a term which unfortunately reappears in “logical processor”)
  • Core: A CPU has one or more of these but they are physically distinct.
  • Logical Processor: A conceptual entity representing the state of a core executing a single thread. Some cores have additional hardware that let them support another, separate core state associated with a second thread - which lets the core switch to the second thread when the first thread is experiencing cache misses, latency, or branch mis-predictions in order to keep the execution units fully utilized. Such a core supports two threads or two logical processors.

But, yes - a single Erlang VM (and therefore node) has access to all the cores available on the machine regardless of whether the cores reside on a single CPU or those cores are spread across multiple CPUs - provided the OS gives the VM access in the first place.

My misconception was that cores residing on separate CPU’s could turn out to be an obstacle of some kind - but as it turns out OS SMP doesn’t seem to differentiate between the penalty of moving a thread between cores on the same CPU vs. cores on different CPUs (I imagine the latter has a higher penalty - but in the end it’s more important to keep things running).

However I keep running across the “one scheduler per core” statement and that needs to be updated:

If the Erlang runtime system is able to determine the number of logical processors configured and logical processors available, Schedulers defaults to logical processors configured, and SchedulersOnline defaults to logical processors available; otherwise the default values are 1.

This seems to suggest that the Erlang VM benefits from hyper-threading.

The current BEAM can bind schedulers on specific cores (on supported OSes), so that performance can be fine tuned.

And the specifics can be found here (only supported on newer Linux, Solaris, FreeBSD, Windows), the available options are impressive, however it’s also important to note:

As from ERTS 5.9 (Erlang/OTP R15B) the runtime system does by default not bind schedulers to logical processors. For more information, see system flag +sbt.

… but in many cases the default could be good enough.


With that out of the way - moving a single primary OTP application to a machine with additional CPUs still counts as vertical scaling (which is a monolithic way of operating) - for horizontal scaling it becomes necessary to break the primary application down into smaller primary applications that can run on separate machines.

(that’s also called a “socket”, which is less confusing as CPU means a logical processor in Linux docs)

1 Like

We went down this route, and though I left before it went to production, it seemed like a great design for this kind of problem.