Find Elixir process running on a box given it's name

We have used https://github.com/BlakeWilliams/Elixir-Slack to write a slack-bot. The question is not about the library but about Elixir. The bot is run as a worker with name :slack_bot under a supervisor.

During tests, the bot was run from multiple nodes (boxes). Now we have a situation where 2 bots respond whenever a message is sent. Unfortunately we don’t know where the second bot is running from.

The question is given the process name (:slack_bot), can we check if an Elixir process with that name is running on a box?

:wave:

You can probably use https://hexdocs.pm/elixir/Process.html?#whereis/1.

3 Likes

That wouldn’t work because it needs to run from within iex and it would return nil.

Process.whereis/1 can be called from any function but it does need to be run on the node which it is checking. You could do something like:

:rpc.call(node_name, Process, :whereis, [:slack_bot])

which would tell you if there is a process with the registered name :slack_bot running on it and return its pid. I don’t know of an Elixir equivalent module to :rpc.

4 Likes

I tried :rpc.call using node name obtained from epmd -names (something like :nodename@hostname)
but I get the error {:badrpc, :nodedown}.

However if I connect to :nodename@hostname and run the command it shows the pid correctly
And from iex>Node.ping(:nodename@hostname) gives :pang

So the node is very much up and running.

Not sure if it is something to do with cookies?

Actually if the node is up and running then Node.ping/1 should return :pong .

Yes, getting the cookies right is critical. I had assumed that this part of the handling of distribution had been solved. How have you started the nodes with “sname” or with “name”? They have to be done in the same way, and it does affect the node names.

3 Likes

The image explains it better I think. I still get :pang

Nodes are started with name (not sname)

are you inside containers or some sort of orchestration system? It’s possible that you might need to allow access to epmd ports, and it’s possible that 127.0.0.1 doesn’t mean the same thing for each node?

The cookies are different, the right one has a colon at the start, the left doesn’t have that.

4 Likes

OMG How did that happen?

I copied the result of Node.get_cookie - including the colon - while starting the other node.

It works fine and apologies to the good people here who had to rake their brains over a silly mistake.

Well, the colon killed my weekend

3 Likes

And :rpc.call(node_name, Process, :whereis, [:slack_bot]) works just fine

2 Likes