Is it possible to have nodes running independently, and then start running a distributed app with a fail-over/takeover configuration programmatically at run-time?
I’ve been working with the Magic 8-ball example in https://learnyousomeerlang.com/distributed-otp-applications with the A-B-C nodes. I have it working in Elixir, but it requires configuration files that are pre-configured with the distributed node cluster and fail-over rules. I’ve read about libcluster+swarm, but it seems like overkill for my 2-node clusters. And it seems like I would still have to have the nodes pre-configured.
Here’s the idea of what I’m trying to accomplish. Maybe there is an easier/better way to do this, I’m open to ideas:
I will have two nodes that are running on separate IoT devices. At first these nodes don’t know about each other, and they don’t know if they will ever be clustered - they may not be.
The devices will receive a signal (details TBD) informing them that they should connect to each other and start running a distributed application with one node acting as fail-over for the other.
Playing around in iex, it feels like I’m getting closer, but not really.
Note, for this example, both nodes are on my local computer.
iex --name "a@192.168.0.2" -pa _build/dev/lib/m8ball/ebin --cookie cookie_s
iex --name "b@192.168.0.2" -pa _build/dev/lib/m8ball/ebin --cookie cookie_s
Note: Initially the cookies won’t be the same, but a shared cookie will be part of the signal/command to connect.
Devices receive the command to connect and run the distributed m8ball app.
-
Connect the Nodes
Node.connect(:"b@192.168.0.2")
-
Start the distributed application at
a
.
:dist_ac.start_link # It seems like I need to start `dist_ac` (since it's not running on the individual nodes).
:dist_ac.load_application(:m8ball, [{:m8ball, 5000, ['a@192.168.0.2', {'b@192.168.0.2'}]}])
It seem like :dist_ac.load_application
is what I need, but it never returns. I’m not sure what I’m missing.
There is not a lot of documentation or articles that I can find about this. I had to look at the Erlang source code to get this far.
Any suggestions?