Cluster erlang nodes without DNS and FQDN

I have create a Elixir Phoenix Project that is to be run on bare metal physical servers (one node per physical server) in a LAN setup.

I am using Elixir mainly for fault tolerance.

I am exporting my application using mix releases and running them on multiple nodes (1 node being 1 physical server)

I am trying to connect these nodes to each other using a custom UDP broadcast based setup, I tried using “libcluster” but it did not work out for me.

So what I am doing currently is broadcasting the nodename, from each node, to the UDP broadcast address 255.255.255.255 and then when other nodes receive the packet, they try to connect to each other.

However there are many issues that I am currently facing. I tried disabling EPMD and running the node on a static port (following many examples online, but that did not really solve my issue)

In my case, I don’t control the machine’s hostnames. The machines can all have the same hostname, and are thus by nature not resolvable by something like machine1.local or such. Also since this is just LAN, there is no DNS running, and this system is supposed to work without internet. That means this system is completely offline and not connected to the public internet.

Since i don’t control the machine’s hostnames, and in a cluster erlang nodes must have unique names. I am setting them up using this env.sh.eex file.

#!/bin/sh  
NODE="$(uuidgen)"
export RELEASE_DISTRIBUTION=name 
export RELEASE_COOKIE="static-cookie" 
export RELEASE_NODE="app@$NODE"

I am trying to set them to a unique name by providing a random hostname myself.
(I realize that this might not be a good solution)

From what I understand the erlang node names are not separate parts, they are a whole atom. So it is not really app@host, it is just a whole host name.

I have read up on several articles to come to the above conclusion.

SO here I am asking the community on what should I do to allow seamless automatic node connections in my setup.
NOTE: I want that whatever N number of machines are running with my elixir phoenix project on the local lan, should form a cluster, without DNS, FQDN or worrying about DHCP.

Since each node can also be connected to the LAN using multiple NICs (for fault tolerance), suffixing the node name with the IP address is not really a solution in my case, and also DHCP can make it change during restarts (such as if something goes wrong with one node)

So what are my options to make it auto connect, with the above restrictions.

I am open to any changes to my idea, but it would be better to depend less on the network, and more on the code, beacuse I am not in control of the network or the machine’s hostnames.

Thank you for reading the above.

2 Likes

I’m not an expert on this, but my understanding is that you need three things to make a connection:

  • Both sides need the same cookie
  • Both sides need to agree on the full name of the node being connected to, including the host part
  • The connecting side needs the part after the @ to resolve to the host the listening side is running on. That means it either needs to be an IP address, or it needs to be a working DNS name

If you can’t use IP addresses you need some kind of name resolution. One idea could be to generate a hostname (I believe it needs at least one dot) and have the receiving side of the broadcast put that into the local hosts file with the correct IP address before connecting.

You can also implement a custom EPMD module. The address_please/3 callback is the one doing DNS resolution.

1 Like

Register in postgres and setup notification ?

Something like this - GitHub - supabase/libcluster_postgres: Postgres strategy for libcluster?

3 Likes

Nice, didn’t know existed, I was suggesting how to build one if nothing out there. This is great.

1 Like