Distributed Erlang for non-mesh topologies

joaoevangelista · May 2, 2023, 11:51pm

Hi everyone!

I’m pondering to create a simple “leader + workers” architectures where workers can send updates back to the leader as results are being ready. Like primes being found within a range sent from the leader.

So the question is, does using the mesh topology of distributed Erlang “ok” to do this? since all nodes know of each other, and workers will be a different application than the leader I would have to query them and their type to filter it before sending the job.

I would use Phoenix pubsub and libcluster for this to avoid external dependencies such as broker. Consul is an option to form the cluster.

Any feedback is appreciated!

derek-zhou · May 3, 2023, 1:36pm

There is GitHub - lasp-lang/partisan: High-performance, high-scalability distributed computing for the BEAM., that can give you some flexibility in topology and failure tolerance.

asmodehn · May 3, 2023, 1:44pm

Hi,

Since the question is quite broad, I ll allow myself to reply a bit broadly, with maybe debatable statements

The actual usage you are aiming for is important to know, as what matters is often unsuspected.
Is the “prime being found within a range” the actual goal of the software ? how big is the search space (what is the upper bound) ?

IMHO there are roughly 3 kinds of systems:

localized (timing and order are “obvious” concepts)
distributed (timing and order are constraints to be aware of)
trustless (trust of the hardware, and intentions of the operator are constraints to be aware of)

Distributed Erlang can help you manage any kind of distribution problems I know of, and is always “ok” to use to solve distribution problems. But it will not help (at least not more than other programming languages) with trust issues on hardware (including network): who is operating it, what type of hardware/network it is, what are the operator intentions, etc.

You need to think about where to split between leader and workers: along the node boundary (one node for one role), or along the app boundary, or the process boundary, etc. independently of where the code will run…
If you are aiming for that “prime being found within a range” application, process boundary seems the simplest way to arrive at the logical architecture you probably have in mind.
Emphasis on logical, as in your description, I understand that you want a logical distribution, not really a physical distribution.

There are many ways of distributing software (in space and time - think version updates) and if you are not experienced with distribution, I would advise to first get familiar with Erlang processes and logical distribution by developing a single application and running it on a single node before moving on to more complex concerns (space distribution between remote machines, time distribution between various versions, etc.)

Of course if you want to play with a cluster, then by all means, play with a cluster , but I don’t think the prime search application requires it. Also in any case, you would need a logically distributed software to be able to physically distribute it.

Good luck in your BEAM adventures and, most importantly, enjoy it !

joaoevangelista · May 3, 2023, 5:57pm

yes it’s kinda broad question, I appreciate the words! I want a distributed for the sake of studying them with a simple use case, thus calculating primes and streaming events back to the user.

Indeed I could make it logically split, then use more nodes to make it more resilient independent of their role, so they could cover the failed node. e.g everyone can be a leader or a worker. Thats something I didn’t consider so thank you!

schneebyte · May 5, 2023, 8:20am

Check out :pg
For simple leader + workers you would only have to connect the workers to leaders, no need for fully connected mesh. Then have the workers join a :pg group. Leaders can monitor the :pg scope and get updates when workers join/leave.