Distributed Workers

I want to know how we can scale workers across nodes dynamically.
If a new node is connected to the cluster, how can the worker pool be notified of this?
What are the different strategies used?
And how do we start only the workers without starting the entire application in the new node?

PS: I have never created a distributed system before

Elixir[Erlang]/OTP gives you the tools to design a distributed system. It is up to you to use the tools to implement the system you want. That is a great thing, I think. That allows great flexibility over your system. I’m going to give you a simplified example:

Now as far as dynamically attaching nodes to accept work. Perhaps you can have a GenServer running that looks for new oncoming nodes. This GenServer can watch Node.list and diff changes from the last check looking for nodes prefixed with “worker.” So if the previous list was [:"worker-1@"] and the new list is [:"worker-1@", :"worker-2@"] your GenServer should detect that worker-2 is the new worker and perhaps tell another GenServer that handles work distribution that a new worker is online. The opposite behavior should work when a worker has been detected to leave.

As for work distribution, you asked if there is the ability to do so without running the application on the worker nodes. The answer is yes, but might be better to do so. Lets say you do not span a node running your application and spawn a link on a worker node like this (pull most recent profile, set encryption keys, and finalize the profile):

Node.spawn_link :"worker-1@", fn ->
  with {:ok, profile} <- Profile.fetch_pending_profiles() do
    {:ok, _} = EncryptionService.set_enc_keys(profile)
    {:ok, _} = Profile.finalize_profile(profile)

This would be a problem because you would receive an UndefinedFunctionError error on the worker node. This is because the worker node does not have the Profile or the EncryptionService modules. It is still possible to accomplish this without running the application on the worker node, but you would have to pass on the code to do all the database stuff, etc. You could avoid this issue by running the application on the worker nodes.

This was just my quick little comment…

Check these out too: