How to ensure nodes are connected?

I’ve got two nodes connected using libcluster. It has been working fine for some time, but suddenly I noticed the Node.list was empty. After manually connecting the nodes again it seems fine.

But my question is: In a production system how can I ensure this doesn’t happen again? Do I manually need to have a task running which checks the Node.list and attempt to reconnect or at least send an email to me letting me know something iffy is going on. Or is there something built into either libcluster (which should be able to heal) or Erlang to solve this for me?

Thanks in advance!

I haven’t used libcluster, however Erlang has a monitor_node function. I would look into libcluster before implementing anything using monitor_node to avoid duplicating unnecessary features.

I would also investigate why the nodes apparently lost connectivity with each other.

2 Likes

:wave:

What strategy are you using with libcluster? I think all strategies that I’ve used try to reconnect on each gossip round, which for me was usually 5 seconds. I’ve looked through some other strategies, and that seems to not be the case for some of them like Cluster.Strategy.Epmd since it probably delegates reconnecting to erlang facilities.

1 Like

I am using the Epmd strategy which I guess explains why no reconnect happens. I will look into changing to Gossip strategy since I don’t use Kubernetes. The cluster is very small so it should be fine!

Thank you for your help