I have a mysterious problem running my app on multiple nodes I hope someone can advise on.
A while back I set up my deployment script to build 2 separate releases of my app two serve behind nginx for load balancing/zero down time deployment purposes (with RELEASE_NAME
set to something like “my_app” and “my_app2”). I am managing each app process as a systemd service. This all works great–after a deployment I can tail the logs of each server and observe them both serving requests, and use systemd to stop 1 app instance and observe that nginx is still responsive.
The problem is that when I go to connect to my_app
using the remote
command I get the error Could not contact remote node app@127.0.0.1, reason: :nodedown. Aborting...
Connecting to my_app2 works fine.
I can fix this issue by running service my_app restart
. After that, I can connect to both instances.
However, if I run service my_app2 restart
the app
node again appears to be down.
I am using libcluster to manage the nodes using the following config:
[
my_app: [
strategy: Cluster.Strategy.LocalEpmd
]
]
my_app env.sh:
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=my_app@127.0.0.
my_app2 env.sh:
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=my_app2@127.0.0.
Thanks in advance for any clues or suggestions of things to try!
1 Like
No RELEASE_COOKIE?
You should have one shared by all the nodes…
And provide it as well when using remote.
1 Like
I set the RELEASE_COOKIE sys var when I built the releases and it appears to be set properly:
iex(my_app@127.0.0.1)1> (System.get_env()
...(my_app@127.0.0.1)1> |> Enum.map(fn {k, v} -> "#{k}=#{v}" end)
...(my_app@127.0.0.1)1> |> Enum.filter(&String.starts_with?(&1, "RELEASE_")))
["RELEASE_BOOT_SCRIPT_CLEAN=start_clean",
"RELEASE_ROOT=/home/my_app/apps/my_app/releases/20220812151723/api_v2/_build/prod/rel/my_app",
"RELEASE_SYS_CONFIG=/home/deployer/apps/my_app/releases/20220812151723/api_v2/_build/prod/rel/my_app/releases/0.1.0/sys",
"RELEASE_VSN=0.1.0", "RELEASE_DISTRIBUTION=name",
"RELEASE_COOKIE=[redacted]",
"RELEASE_VM_ARGS=/home/my_app/apps/my_app/releases/20220812151723/api_v2/_build/prod/rel/my_app/releases/0.1.0/vm.args",
"RELEASE_BOOT_SCRIPT=start",
"RELEASE_TMP=/home/my_app/apps/my_app/releases/20220812151723/api_v2/_build/prod/rel/my_app/tmp",
"RELEASE_COMMAND=start", "RELEASE_MODE=embedded", "RELEASE_NAME=my_app",
"RELEASE_NODE=my_app@127.0.0.1"]
Note: the output is exactly the same in both nodes, which I can connect to after the restart I mentioned above, aside from the node name/paths.
After restarting the first instance, I can see both nodes:
sudo service my_app restart
erts-11.1.8/bin/epmd -names
epmd: up and running on port X with data:
name my_app at port Y
name my_app2 at port Z
But after restarting the second instance, the first is gone?
sudo service my_app2 restart
erts-11.1.8/bin/epmd -names
epmd: up and running on port X with data:
name my_app2 at port Z