Hey! Hope everyone is doing fine during these times!
I know lots have been discussed here regarding doing microservices architecture in Elixir/Erlang but I still have some questions regarding a particular way of doing it using processes shared between nodes in a cluster of Erlang nodes. I was kind of inspired to try this out by this article from @josevalim http://blog.plataformatec.com.br/2015/06/elixir-in-times-of-microservices/.
I’ve created a repository (https://github.com/crqra/ms_ex_arch) with some demo apps to demonstrate an architecture based on the Erlang remote processes’ features. It includes two services: order_service and user_service; and a web api for frontends to interface with the services. Find instructions to run the demo in the repo’s README file.
Each service has a
GenServer with which all other services and (Elixir) apps can interact with and they’re started with a name that should be known to other services who want to interact with it and registered globally.
For me, as my experience with the “distributed” features of Erlang isn’t that much, this architecture seems like a very good approach to “microservices”. However I’m pretty sure I’m missing the hidden/not so clear issues that such architecture would create and that’s why I’m writing you this, to get your help with those.
For example, globally registered processes is a nice way to easily know the correct process we’re interested between nodes, but I’m aware that each name must be unique so it’d create an issue if we would like to scale for instance the
order_service. From my readings I believe this could be fixed using
swarm for grouping processes but how would for instance the
web app know which process to call based on the node that is having the last pressure? Would this be an issue at all if back-pressure would be correctly implemented or pools used in the first place on the single
Other question would be: given that when an Erlang node (A) connects to another (B) that’s already connected to (Z), (A) would automatically be connected to (Z) as well even though it just wanted to talk to (B) in the first place. This seems fine if we only have 10s of services; but how does this behaves if we have 100s or 1000s of services in our architecture? Can someone point out resources for further research, please?
And last question would be how this nodes would be able to communicate if they’d be running in different networks? This isn’t so much related to the overall topic, but my lack of experience requires me to ask. How could I’ve the nodes communicating for example if
web would be on a public network but
user_service would be on a private (internal) network? Would there be issues?
Sorry for the somewhat confusing topic, feel free to suggest changes to this initial message or point me out to resources I’ve might not have found based on my questions.
Thanks in advance for any help!