I have a release produced by the new 1.9 mix release. The shell script generated for the release can start the service, but none of the other functions that depend on connecting to the running node, such as restart, stop, or pid work. All fail with --rpc-eval : RPC failed with reason :nodedown. When I query epmd with -names it shows no registered nodes.
I can see that the service was started with -sname my_app… so why would epmd not have this name listed?
I am seeing this exact same behavior. If I run the release locally I am able to start/stop the app. epmd reports the same name both locally and on my production Debian server but I am only able to start the app on the Debian server and get the :nodedown error when I try to stop it. Any ideas on how to troubleshoot this?
I am also experiencing the same problem in a Debian server on GCP. The solution offered by @barndon above does not work for me since trying to stop the server first fails and the CI exists with a failure
I got a nice explanation as to why this might be happening on this blog
Summary: It is caused by cookie mismatch
Solution (straight from the blog):
The first thing we need to resolve is to ensure that every time we start our release, the same cookie is used. Fortunately, this can be easily done by using RELEASE_COOKIE environment variable or putting the cookie in our release configuration in mix.exs:
I don’t want to necro a thread but this IS the number one google search for this problem at least for me and I feel this solution could help others.
For us the solution to this problem was the following: we were deriving the IP for the long name using $(hostname -I) in our env.sh.eex file, for some reason the return from this hostname command added a single space after the ip address which confused most of the commands (rpc | stop | pid | remote etc.).
Trimming whitespace when building the node name fixed this issue, so instead of email@example.com <- space here we got firstname.lastname@example.org and the scripts started working as expected.
This whitespace was stupidly difficult to actually see it turns out - you can test if you have this problem by adding an echo "$RELEASE_NODE|" to the bootstrapping script under one of the commands (rpc/pid or whatever) and you’ll see the space before the pipe if you’re suffering from trailing whitespace email@example.com |