Dynamically connect nodes and start a distributed application

mmyers · February 21, 2020, 9:41pm

Is it possible to have nodes running independently, and then start running a distributed app with a fail-over/takeover configuration programmatically at run-time?

I’ve been working with the Magic 8-ball example in https://learnyousomeerlang.com/distributed-otp-applications with the A-B-C nodes. I have it working in Elixir, but it requires configuration files that are pre-configured with the distributed node cluster and fail-over rules. I’ve read about libcluster+swarm, but it seems like overkill for my 2-node clusters. And it seems like I would still have to have the nodes pre-configured.

Here’s the idea of what I’m trying to accomplish. Maybe there is an easier/better way to do this, I’m open to ideas:

I will have two nodes that are running on separate IoT devices. At first these nodes don’t know about each other, and they don’t know if they will ever be clustered - they may not be.

The devices will receive a signal (details TBD) informing them that they should connect to each other and start running a distributed application with one node acting as fail-over for the other.

Playing around in iex, it feels like I’m getting closer, but not really.
Note, for this example, both nodes are on my local computer.

iex --name "a@192.168.0.2" -pa _build/dev/lib/m8ball/ebin --cookie cookie_s
iex --name "b@192.168.0.2" -pa _build/dev/lib/m8ball/ebin --cookie cookie_s

Note: Initially the cookies won’t be the same, but a shared cookie will be part of the signal/command to connect.
Devices receive the command to connect and run the distributed m8ball app.

Connect the Nodes
Node.connect(:"b@192.168.0.2")
Start the distributed application at a.

:dist_ac.start_link  # It seems like I need to start `dist_ac` (since it's not running on the individual nodes).
:dist_ac.load_application(:m8ball, [{:m8ball, 5000, ['a@192.168.0.2', {'b@192.168.0.2'}]}])

It seem like :dist_ac.load_application is what I need, but it never returns. I’m not sure what I’m missing.
There is not a lot of documentation or articles that I can find about this. I had to look at the Erlang source code to get this far.

Any suggestions?

anthonator · February 22, 2020, 1:21am

libcluster was built to auto-connect nodes. If that’s the functionality you’re looking for I don’t think using it for a 2 node cluster is overkill. It’s designed to work in a dynamic cluster so nodes coming and going at runtime shouldn’t be a problem.

You will probably need to find another tool to handle failover. Maybe look into gen_leader. I’ve never used it but it looks like it handles primary/replica semantics. Someone else might have to chime in on this one.

mmyers · March 3, 2020, 11:08pm

Thanks for your suggestions anthonator.

Apparently, what I was trying to do is not really possible. If I had kept reading on learnyousomeerlang, I would have seen:

The thing is, for the mechanism to work, the application needs to be started as part of the boot procedure of the node .

Currently I’m going with a simple Node monitor GenServer to detect :nodeup and :nodedown and can easily connect with Node.connect now I’m also looking into Epmdless