Is an app with 3000 Microservices a fit for Elixir/OTP over Kubernetes/GRPC

tkruthoff · August 29, 2017, 4:58pm

Elixir and Phoenix are awesome and I’m hoping I can use them on my next project, however I’m an Elixir/OTP newbie and I’m not finding real examples of how Elixir apps with lots of moving parts are written, maintained and deployed.

For context, elixir-in-times-of-microservices by Jose is what first peeked my interest and I’ve bought/read every Elixir book I can get my hands on and this what we’ve come up thus far:

Umbrella OTP App containing:
API: phoenix and ecto serving REST and GraphQL – originally ecto was in separate app called DB, but then we settled on the API (and not the DB) is what the other apps/services interface with
FrontEnd: Main webapp our customers use (phoenix)
Admin: Backoffice webapp for staff (phoenix)
Integrations: This is where things go off the rails, we injest and transmit data, mostly in XML via HTTPS with over 3000 sources. An App per seems out of place, in most cases we have a REST or Soap client that listens to a queue and takes action, much like a background task. A lot of them will also spin up a GenServer per source (and this should be singleton across the entire cluster) and poll the source for data. We need to be able to control start/stop/pause per source as the sources like to add/remove data elements from time to time, best scenario would be able to update and deploy new code for a source without affecting the rest of the system. So we are thinking most of the 3000 integrations will be a Supervisor, with a GenServer (receiver) and a Module (sender).

Historically, each source ran as a windows service on a beefy single box, manually deployed, paused and updated separate from the web apps. (new services will run on Linux) The benefits/joy of being able to pull up :observer and peer into the source processes with state, or to remote iex and tinker would be epic.

Is anyone doing something like this in production that is willing to share how they structured the many integrations and deploying and upgrading a system like this. It’s possible we could be a trailblazer, but an existing production story would be invaluable.

Thank you

OvermindDL1 · August 29, 2017, 6:22pm

Sounds perfectly reasonable to me?

If you are wanting to hot-swap ‘services’ then I’d probably make each it’s own module with an interface to a GenServer or a Pool that is hidden from the front-, and just delegate everything through that interface. Via ETS you could then turn on/off them, you could hot-code swap atomically, etc…

peerreynders · August 29, 2017, 7:06pm

I think that article was triggered by an earlier StackOverflow topic - where I found that one particular passage stood out:

So far I haven’t talked about microservices. That’s because, up to this point, they don’t really matter.

i.e. building (potentially distributed) applications with Elixir/OTP can leverage some of the benefits associated with microservices architecture without having to accept the overhead of supporting microservices architecture - but that doesn’t mean Elixir/OTP solutions necessarily would classify as realizations of microservices architecture - and that’s OK because microservices architecture isn’t supposed to be an end in itself. Meanwhile there are lots of principles that microservices architecture is based on that can also benefit Elixir/OTP solutions.

Of your post, to me, this immediately stood out. Because this kind of implies that at least on a conceptual level you currently have 3000 “beefy single boxes” processing your sources and yet:

An App per seems out of place,

… which kind of sounds like what you are doing right now. From my current research, still incomplete and ongoing - so I’m sure somebody will correct me if I’m wrong:

Node: An executing Erlang run-time system (ERTS) which can communicate with other Erlang run-time systems.
While a single hardware server (or virtualized OS) can run multiple Nodes, a Node itself cannot be distributed.
Given that a Node executes an ERTS and that a Release includes an ERTS it seems that a Node can at most run one Release (???)
While a Release can include multiple “applications” it seems to only support one primary application while the remainder are simply included applications. Therefore a Release seems to only support one supervision tree (???). So far my search for a release supporting multiple primary applications and supervision trees has come up empty.
The point being is that a Node is executing one single primary application and it’s supervision tree.
It would be tempting to equate (one Node == one microservice) - but that overlooks the level of isolation and decoupling that is possible within the same supervision tree.
While the Node itself can’t be distributed, the application executing within it can spawn processes on other Nodes - though design-wise I would prefer an explicit (primary) application spawning and managing processes on that second Node on behalf of the first Node.

Essentially it seems the current starting point of your “Windows service” (per source) is roughly equivalent to a single primary application on a dedicated Node.

Now I suspect it’s unlikely that cramming all 3000 sources into one single Node/Application is that great an idea either - given that a single Node doesn’t distribute - so it’s more likely you are looking for an application that can be deployed on to multiple distinct Nodes, each Node listening to any number of sources via configuration - which all forward their pre-processed/normalized results to yet another Node which executes the application that concentrates/aggregates the results.

OvermindDL1 · August 29, 2017, 7:20pm

I doubt they really use/need 3000 physical servers though, probably VM’s and so forth, otherwise crazy expensive and they would probably not be asking how to fix it on a forum like this. ^.^;

But yeah, distributing erlang nodes is significantly easier, and just start with doing it all in one node until you actually do need to scale out. If you keep to the Module interface as I described above then it is much easier to scale out, that is the purpose of the single module Interface.

tkruthoff · August 29, 2017, 9:50pm

3000 windows services on a single node, that can be stopped, started, paused and upgraded individually. So yes, looking to do the same with Elixir/OTP. The work does not have to be distributed, so all 3000 on a single node is no problem, the main requirement is the ability to stop, start pause and upgrade individually. We thought Umbrella is great because then all 3000 services can talk to the API application via elixir message passing, so a more succinct question would be how to have:

An API application on the same node and BEAM as 3000 gen servers that can be stopped, started, paused and upgraded individually. With Kubernetes/GRPC I think we would put each service in a docker container, this would be much heavier than elixir but it does fit the need to manage the services individually.

Thanks again

AstonJ · August 29, 2017, 10:27pm

You might find this thread (and the blog post and course mentioned in it) of interest:

Dave doesn’t even use umbrella apps for them, preferring to just make each micro service a separate ‘normal’ Elixir app. I haven’t finished all the Elixir learning material I want to yet (got a few more books to go), but as of now I think I am going to be following Dave’s way of doing things as it just makes a lot of sense to me. If you haven’t got his course I highly recommend it.

NobbZ · August 29, 2017, 10:32pm

With 3k services this would result in 3k nodes, which would mean roughly 9M node-interconnections, since the BEAM wants to interconnect all nodes with each other. This won’t work well. You need to replace the messaging by another mechanism that does not require the full mesh.

Also you’d load the BEAM 3k times, while by putting some if not all of those modules alltogether on a single bare machine without docker, you can still update the modules one by one…

tkruthoff · August 29, 2017, 10:52pm

@NobbZ

^^ note the With Kubernetes/GRPC

With Elixir and advantage would be to have all services on the same node using message passing for intercommunication, but the question would be how to stop, start, pause and upgrade each service individually. Each service seems it would be a GenServer, but maybe an application?

Thanks

tkruthoff · August 29, 2017, 11:06pm

Thanks for the link to dave’s article. I had read it, but after re-reading it after your mention a light-bulb might of gone off:

which (and I apologize for my newbness), I guess he is using filepaths in his mix.exs dependencies section. And worst case scenario is we could use a git-deploy type scenario where we can pull code down to the production node and then use a remote iex to reload the application and be able to update the deps/applications without affecting the other applications on that node.

AstonJ · August 29, 2017, 11:26pm

Yep, using file paths: Applications are Components (AKA Microservices in Elixir) - #10 by AstonJ

I guess many organisations might opt for the new private packages feature of Hex.pm Hex.pm is adding private packages and organizations

OvermindDL1 · August 29, 2017, 11:51pm

/me just uses a git repo as dependencies at work…

cmkarlsson · August 30, 2017, 12:19am

I’d prefer a completely stand-alone private hex server that can run
internally where we can upload our internal packages to. I have a
half-implemented one for rebar3 which I am hoping to complete one day

I don’t like using version control being central to how I distribute my
dependencies. Then I must use a VCS which is supported, I must use
one repo per dependency. The language dependency management shouldn’t
care at all what VCS I am using and it should not make that decision for me

But until my private package server is actually done, we also uses git
repo dependencies for our private stuff.

peerreynders · August 30, 2017, 3:31am

This is the potential misunderstanding I wanted to address in my first reply. As far as I can tell a single node runs exactly one “primary application” - the other included applications act as libraries to the primary application by becoming part of the primary application’s supervision tree.

It’s a peculiarity in the OTP naming convention - a library application does not implement the Application callbacks and therefore cannot be started or stopped (as an application). So for example Poison is an “OTP application” but it’s a library application for use by a primary application.

$ mix new app_name

creates a library application - not a primary application. For a primary application

$ mix new app_name --sup

is required - that will include the application callback to create the supervision tree.

An umbrella project still contains just a single primary application - one of the “applications” implements the application callbacks and starts the supervision tree while the other (included) “applications” simply become part of the first application’s supervision tree.

but the question would be how to stop, start, pause and upgrade each service individually. Each service seems it would be a GenServer, but maybe an application?

In this situation the term service could cause some confusion/ambiguity. Handling a single source may require a number of processes, possibly even a small library application that could be designed to have it’s processes handling the source started, paused, and stopped. Upgrades could be trickier. In general hot code reloading targets code at the module level - and in many use cases the recommendation is to avoid using/supporting it because it tends to make the overall design much more challenging.

The work does not have to be distributed, so all 3000 on a single node is no problem.

The issue is

it’s not exactly clear how severe a workload handling a single source is
how capable the physical/virtual CPU is that the node resides on.

While a node will spread all its processes over all the cores of the CPU - it can’t scale by utilizing additional CPU’s (short of spawning processes on other nodes that reside on a different CPU). So while the total number of processes shouldn’t be a problem for a single node, the actual workload of handling 3000 concurrent sources could potentially be too much work for the CPU the node is executing on. If that is the case the solution design will have to account for the eventuality of distributing the workload over separate nodes each executing on their own CPU.

Taking service oriented design principles into account it may make sense to avoid sharing a node configuration database across multiple nodes but instead have a separate “configuration node” which supplies the other nodes with their configuration information when they start up (which could also route start, pause and stop requests to the appropriate node).

Ultimately the design details are affected by the expected workload and expected capability of your deployment platform.

minhajuddin · August 30, 2017, 9:08am

We are building an API which talks to around 50 services which is not huge, but they are all different providers which understand SOAP, XML, JSON, Rest etc,. And, we have had good success with just putting them in different modules. I think just having 3000 genservers with supervisors may work out without a lot of complications. Like @NobbZ mentioned creating 3K nodes is definitely not a good idea. You will also have to tune your :hackney settings (if you are using HTTPoison) as it sets the max connection limit to a default value of 50.

I would personally do the simplest thing possible, by building out some common utilities which the 3000 services can use. And run them on a single node.

arkgil · August 30, 2017, 9:10am

That is not entirely true, a single node can run as many applications as you want to, and each application can have its own supervision tree. The only limitation is that only one instance of each application (identified by name) can be started.

You can call :application.info()[:running] to inspect what applications are running on the node. For example, bare IEx shell outputs:

[logger: #PID<0.71.0>, 
 iex: #PID<0.65.0>,
 elixir: #PID<0.59.0>,
 compiler: :undefined,
 stdlib: :undefined,
 kernel: #PID<0.33.0>]

which means that logger, iex, elixir and kernel are the applications with the supervision tree (the PID in the list is the PID of application’s application master process - which is not the PID of the top supervisor). compiler and stdlib are library applications.

orestis · August 30, 2017, 9:46am

There’s a few questions here:

Are those 3000 “services” independent of each other?
Are they actually stateful?
Do they actually need to do background processing?

Integrations: This is where things go off the rails, we injest and transmit data, mostly in XML via HTTPS with over 3000 sources. An App per seems out of place, in most cases we have a REST or Soap client that listens to a queue and takes action, much like a background task. A lot of them will also spin up a GenServer per source (and this should be singleton across the entire cluster) and poll the source for data.

It seems to me that the REST/SOAP/XML is just “glue” to let the main service coordinate with those remote services?

If that is the case, you don’t even need to spin up 3000 GenServers ahead of time: Just make each service its own module, with plain functions, and “just call it” from wherever you need.

You will most likely need to add some abstractions on top if you want the call to be made non-blocking and whatnot. At the simplest level, spawn a process to do the calculation.

In those abstractions, you can easily spawn this process in a completely different machine, and the rest of the code won’t even know the difference.

So you could start with a beefy machine running just a single BEAM process (it will take over all the CPUs if you have a really beefy machine with multiple physical CPUs), and only if that appears to be not enough you can add a second beefy machine.

Regarding upgrading code, with OTP releases you can do hot code upgrades while the system is running with no downtime. I believe your use case is exactly what it was designed to do.

There’s quite a few details to work out, of course, mostly about the nature of your “integrations” and where do they get their data, state and whatnot.

Don’t worry too much about how you’d package it all up; in the end you’ll have an OTP release, one “primary” app that could just be a facade; the rest of the work will be done by the rest of the Applications (of which you can definitely have 3000, an application just needs to return a supervisor tree).

AFAIK, the only difference of the “primary” app is that if the BEAM can’t start it successfully, it will terminate the actual BEAM process entirely, as there’s no point in keeping the BEAM running if the primary app can’t start.

sasajuric · August 30, 2017, 2:03pm

Running 3000 services in a single BEAM node should not be a problem. You basically start one (or more) processes per each service, and that’s it.

The second part of your requirement is indeed tricky. If you can afford to restart everything, your life will be much simpler. If not, then you must enter the realm of code reloading. Some basic instructions are available here.

In some simpler cases, this might work out of the box, If you actually have 3000 different modules (which I somehow doubt, but who knows ¯\(ツ)/¯ ), and cache previous release builds on the build server, then I think (not 100% sure though) that distillery will be able to detect the change and generate a correct appup automatically.

In more complicated cases, you might need to hand code an appup file. You can find some basic examples here. As far as I understand, appup is quite flexible. Among the low-level instructions there is apply which allows you to invoke a series of functions in an arbitrary order, so I it should be possible to do perform any kind of upgrade logic, no matter how complex it is.

peerreynders · August 30, 2017, 4:04pm

I think that in a discussion like this it is important to stick with the official terminology in order to minimize confusion - so I think you meant to state:

So you could start with a beefy machine running just a single node (it will take over all the cores if you have a really beefy machine with multiple physical cores), and only if that appears to be not enough you can add a second beefy machine.

Because:

A BEAM process is scheduled to run on a single core by one of the node’s schedulers. A BEAM process can move to any scheduler within the node and therefore can run on any core of the CPU but at any one time is either executing or waiting on a single core of the CPU the node is executing on. A BEAM process cannot leave the node it’s executing on (sending a process function and state is more a matter of cloning).
It’s the “Bogdan/Björn’s Erlang Abstract Machine” (BEAM), the Erlang VM that runs on behalf of the node that has access to all the cores (not CPUs). By extension the node has access to all the cores of the CPU the node is executing on (WhatsApp was reportedly using CPUs with 10 cores). However the node is confined to the CPU that it is executing on - so on a true multi-processor (rather than multi-core) architecture the node cannot take over additional CPUs - the best it can do is spawn a process within another node that is running on another CPU (which could be on the same PCB or somewhere across a network connection).

As I stated in my first post I wasn’t entirely sure there was only “one application” - now the logger having its own supervision tree is suggestive of the intent to support “multiple ‘user’ supervision trees” (for the lack of a better term, :kernel, :elixir, etc. I would consider “infrastructure” supervision trees/applications) though I’m still foggy on what is considered “reasonable” practice.

The release file format supports multiple applications by necessity as the “infrastructure” applications have to be explicitly listed in addition to the “user” application. But there seems to be no direct constraint preventing having multiple “user” applications in the same release. But just because it’s possible doesn’t necessarily mean it’s a good idea to have multiple “user” applications in the same release - primarily because that could suggest a certain level of coupling - coupling that might be better served within the same supervision tree.
Two unrelated “user” applications could be in the same release for efficiency reasons - i.e. to share the infrastructure of the node. However it would seem more logical to have unrelated or loosely coupled applications in separate releases - unless a single node can only service one single release (which could make sense as two releases could specify different ERTS versions).

To me there seems to be a certain lack of clarity when it comes to the higher granularity concepts of the “Elixir/OTP alternative” to microservices. On an abstract level a microservice is simply a piece of software designed according to the principles of service orientation that operates within a deployment environment tailored to running and managing microservices. While a running instance of a microservice is typically constrained to a physical machine or specific instance of a virtualized environment that instance could appear on any one of the available physical machines or virtualized environments. The way microservices scale seems straight forward.

Meanwhile the discussion about the “Elixir/OTP alternative” seems to always revolve around processes, supervisors and usually a single supervision tree. However a single supervision tree seems to be practically confined to a single node and therefore CPU. In order to scale further it seems to become necessary to shift gears and start thinking about “OTP applications designed according to service oriented principles” and how to appropriately distribute responsibilities across any number of collaborating OTP applications. This raises questions that simply don’t come up when primarily thinking about single node (primary application) solutions:

Does it make sense (in production) to run multiple nodes on a single CPU or is it better to run all primary applications destined for the same CPU on a single node (provided the primary applications can use the same version ERTS)? What are the limitations and constraints?
Do all primary applications running on a single node have to be part of the same release or is it possible to have multiple releases (with distinct primary applications) for a single node? What are the limitations and drawbacks?

OvermindDL1 · August 30, 2017, 4:13pm

If they even might communicate then all within the same Node is better, less overhead, better scheduling and work distribution.

Same release, but that is what Umbrella’s are popular for.

Personally I package up near all my application as dependencies then just have a main ‘MyServerRelease’ project that just depends on them all and does nothing else, just for making releases. I’ve found it the easiest back to my Erlang days.

peerreynders · August 30, 2017, 4:25pm

So far I’ve only come across umbrella projects where only one of the applications is the (top level) primary application, while the rest are merely library applications - i.e. the entire umbrella project is dedicated to assembling one single supervision tree.

Personally I package up near all my application as dependencies then just have a main ‘MyServerRelease’ project

But that sounds like it’s necessary to deploy the whole “ecosystem of applicatons in a big bang” rather than having the convenience of just deploying the one application that was actually changed - which is a typical “microservices expectation”.