Node down even when app is running

I have a mysterious problem running my app on multiple nodes I hope someone can advise on.

A while back I set up my deployment script to build 2 separate releases of my app two serve behind nginx for load balancing/zero down time deployment purposes (with RELEASE_NAME set to something like “my_app” and “my_app2”). I am managing each app process as a systemd service. This all works great–after a deployment I can tail the logs of each server and observe them both serving requests, and use systemd to stop 1 app instance and observe that nginx is still responsive.

The problem is that when I go to connect to my_app using the remote command I get the error Could not contact remote node app@, reason: :nodedown. Aborting... Connecting to my_app2 works fine.

I can fix this issue by running service my_app restart. After that, I can connect to both instances.

However, if I run service my_app2 restart the app node again appears to be down.

I am using libcluster to manage the nodes using the following config:

      my_app: [
        strategy: Cluster.Strategy.LocalEpmd


export RELEASE_NODE=my_app@127.0.0.


export RELEASE_NODE=my_app2@127.0.0.

Thanks in advance for any clues or suggestions of things to try!

1 Like


You should have one shared by all the nodes…

And provide it as well when using remote.

1 Like

I set the RELEASE_COOKIE sys var when I built the releases and it appears to be set properly:

iex(my_app@> (System.get_env()
...(my_app@> |> {k, v} -> "#{k}=#{v}" end)
...(my_app@> |> Enum.filter(&String.starts_with?(&1, "RELEASE_")))
 "RELEASE_COMMAND=start", "RELEASE_MODE=embedded", "RELEASE_NAME=my_app",

Note: the output is exactly the same in both nodes, which I can connect to after the restart I mentioned above, aside from the node name/paths.

After restarting the first instance, I can see both nodes:

sudo service  my_app restart
erts-11.1.8/bin/epmd -names
epmd: up and running on port X with data:
name my_app at port Y
name my_app2 at port Z

But after restarting the second instance, the first is gone?

sudo service my_app2 restart
erts-11.1.8/bin/epmd -names
epmd: up and running on port X with data:
name my_app2 at port Z