Microservices architecture using distributed processes in a cluster

Hey! Hope everyone is doing fine during these times!

I know lots have been discussed here regarding doing microservices architecture in Elixir/Erlang but I still have some questions regarding a particular way of doing it using processes shared between nodes in a cluster of Erlang nodes. I was kind of inspired to try this out by this article from @josevalim http://blog.plataformatec.com.br/2015/06/elixir-in-times-of-microservices/.

I’ve created a repository (https://github.com/crqra/ms_ex_arch) with some demo apps to demonstrate an architecture based on the Erlang remote processes’ features. It includes two services: order_service and user_service; and a web api for frontends to interface with the services. Find instructions to run the demo in the repo’s README file.

Each service has a GenServer with which all other services and (Elixir) apps can interact with and they’re started with a name that should be known to other services who want to interact with it and registered globally.

For me, as my experience with the “distributed” features of Erlang isn’t that much, this architecture seems like a very good approach to “microservices”. However I’m pretty sure I’m missing the hidden/not so clear issues that such architecture would create and that’s why I’m writing you this, to get your help with those.

For example, globally registered processes is a nice way to easily know the correct process we’re interested between nodes, but I’m aware that each name must be unique so it’d create an issue if we would like to scale for instance the order_service. From my readings I believe this could be fixed using :pg or swarm for grouping processes but how would for instance the web app know which process to call based on the node that is having the last pressure? Would this be an issue at all if back-pressure would be correctly implemented or pools used in the first place on the single order_service?

Other question would be: given that when an Erlang node (A) connects to another (B) that’s already connected to (Z), (A) would automatically be connected to (Z) as well even though it just wanted to talk to (B) in the first place. This seems fine if we only have 10s of services; but how does this behaves if we have 100s or 1000s of services in our architecture? Can someone point out resources for further research, please?

And last question would be how this nodes would be able to communicate if they’d be running in different networks? This isn’t so much related to the overall topic, but my lack of experience requires me to ask. How could I’ve the nodes communicating for example if web would be on a public network but order_service and user_service would be on a private (internal) network? Would there be issues?

Sorry for the somewhat confusing topic, feel free to suggest changes to this initial message or point me out to resources I’ve might not have found based on my questions.

Thanks in advance for any help!

Kind regards,
João

One big issue: each “service” in this setup is single-threaded. Calls back into the same service will deadlock and crash. For instance:

  • application code calls OrderService.list_orders

  • OrderService.list_orders calls UserService.get_user

  • if UserService.get_user tries to interact with OrderService, it will crash because OrderService is already waiting for a reply from UserService

Connections between nodes are usually static and made at boot-time. They aren’t quite the same thing as opening an HTTP connection to another server for one request.

A good place to start reading about building things with processes in the BEAM is the oft-cited “To spawn, or not to spawn?”

1 Like

Designing for Scalability with Erlang/OTP has a chapter on distributed architectures that you might find useful.