Is -ssl_dist_optfile the way to secure cluster of Beam nodes? How to secure connections of publicly exposed Beam nodes and epmd?

dch · July 24, 2019, 9:45am

A few things:

never ever ever expose epmd & your node to the internet. Opinions vary about whether your phoenix/cowboy/… app can be directly accessible, but personally I leave TLS to haproxy or similar tools, do basic route and denial-of-service handling (max connections/second etc) there, and then spread that out across my OTP instances. I also use a firewall to handle blocking really bad actors where haproxy is not sufficient - I then distribute that blocklist across all nodes.
I restrict the range of inter-node ports that the VM can use, and via firewall restrict that range to only permitted nodes. This may require that non-trusted nodes need to bridge via ssh to get in, but I consider direct node access something that users should never get, have, nor need.
erlang cookies are not a particularly big space, and can be brute-forced in parallel - there is no ratelimiting nor any other significant protection. By default the auto-created erlang cookie is a 20 byte sequence of upper-case ASCII characters so choose your own, and make it longer with a wider key space. Leave your epmd listening only on a VPN or private network if you must.
In my specific case, all BEAMs are connected to an internal IPv6 only mesh vpn and this means that any user has to auth to the vpn first. My haproxy instances can bridge the real world to the vpn world, across any physical node.
This makes connecting via remsh like this from a permitted node viz #2:

$ ERL_ZFLAGS='-proto_dist inet6_tcp -kernel inet_dist_listen_min 4370 -kernel inet_dist_listen_max 4400' \
    iex  --cookie (ssh koans@i09 cat .erlang.cookie) \
    --name console@wintermute.zt.koans \
    --remsh zen@i09.zt.koans

TLS only encrypts the traffic between nodes unless you verify certificates. In my experience, TLS has been a huge pain in the butt to manage. I don’t like it, and use mesh vpn or spiped to avoid it. https://www.erlang-solutions.com/blog/erlang-distribution-over-tls.html
As a final point, there is no particular need to use the same cookie on all nodes if your app doesn’t require the mesh. You can manually initiate remsh connections from your control node, using Node.set_cookie/2 and then Node.connect/1 will use a per-cookie node.

Finally, I rarely need console access once apps are deployed to production, so I’m curious what drives this requirement. Do you use distributed erlang layer at all? If not, don’t configure it, and just use ssh port forwarding to deal with any nodes that need some remsh love.