tfwright
Node down even when app is running
I have a mysterious problem running my app on multiple nodes I hope someone can advise on.
A while back I set up my deployment script to build 2 separate releases of my app two serve behind nginx for load balancing/zero down time deployment purposes (with RELEASE_NAME set to something like “my_app” and “my_app2”). I am managing each app process as a systemd service. This all works great–after a deployment I can tail the logs of each server and observe them both serving requests, and use systemd to stop 1 app instance and observe that nginx is still responsive.
The problem is that when I go to connect to my_app using the remote command I get the error Could not contact remote node app@127.0.0.1, reason: :nodedown. Aborting... Connecting to my_app2 works fine.
I can fix this issue by running service my_app restart. After that, I can connect to both instances.
However, if I run service my_app2 restart the app node again appears to be down.
I am using libcluster to manage the nodes using the following config:
[
my_app: [
strategy: Cluster.Strategy.LocalEpmd
]
]
my_app env.sh:
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=my_app@127.0.0.
my_app2 env.sh:
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=my_app2@127.0.0.
Thanks in advance for any clues or suggestions of things to try!
Marked As Solved
tfwright
Forgot to update this, but I eventually discovered this was due to Quantum trying to run jobs on a connected node that itself is not running Quantum (see How to setup quantum against libcluster · Issue #485 · quantum-elixir/quantum-core · GitHub). Switching to the Local run strategy resolved the issue.
Also Liked
kokolegorille
No RELEASE_COOKIE?
You should have one shared by all the nodes…
And provide it as well when using remote.








