Is -ssl_dist_optfile the way to secure cluster of Beam nodes? How to secure connections of publicly exposed Beam nodes and epmd?

bartekupartek · July 24, 2019, 5:47am

I’ve multiple api instances installed on the on premise servers that are sometimes offline. I’d like to setup one master admin node that would allow me to connect to all of them when they are online. The further concern is that it would be nice to connect tenants only to admin node but not to each other, so that only admin node should know about all tenants.

To achieve this goal I’ve setup admin project with Elixir 1.9 with Dockerfile that exposes following ports:

ENV APP_PORT=4000 BEAM_PORT=9000 ERL_EPMD_PORT=4369
EXPOSE $APP_PORT $BEAM_PORT $ERL_EPMD_PORT

and configured release with limited range of ephemeral ports of Beam node:
env.sh.eex

case $RELEASE_COMMAND in
  start*|daemon*)
    ELIXIR_ERL_OPTIONS="-kernel inet_dist_listen_min $BEAM_PORT inet_dist_listen_max $BEAM_PORT"
    export ELIXIR_ERL_OPTIONS
    ;;
  *)
    ;;
esac
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=<%= @release.name %>@my_brodcast_domain.com

The dockerimage is deployed in the Kubernetes cluster and this setup is valid until I’m using this ports only inside the k8s cluster. As I’ve mentioned my most of tenant applications are outside k8s cluster on the on premise servers, so I’ve setup my_brodcast_domain.com domain that resolves Kubernetes LoadBalnacer Service witch 4369, 9000, 4000 ports publicly:

apiVersion: v1
kind: Service
metadata:
  name: admin-api
  namespace: admin-api
  labels:
    app.kubernetes.io/name: admin-api
    app.kubernetes.io/part-of: admin-api
spec:
  selector:
    app.kubernetes.io/name: admin-api
    app.kubernetes.io/part-of: admin-api
  type: LoadBalancer
  ports:
  - port: 4369
    targetPort: 4369
    name: epmd
  - port: 9000
    targetPort: 9000
    name: erlang
  - port: 80
    targetPort: 4000
    protocol: TCP
    name: app

I’m able to connect to admin node from tenant servers instances:

iex --name "myapp2@my_brodcast_domain.com" --cookie "super_super_secret"

Also I’m able to login by remote console “remsh” to admin node and I’ve got access to all connected nodes, hurray I was happy for a while:

iex(admin_api@my_brodcast_domain.com)2> Node.list
[:"myapp2@my_brodcast_domain.com", :"myapp1@my_brodcast_domain.com",
 :"myapp3@my_brodcast_domain.com", :"myapp4@my_brodcast_domain.com"]

but now my concern is about security, I’ve read that I shouldn’t never expose Beam port and epmd port in the public network.

Since Erlang/OTP 20.2 release there is ssl_dist_optfile option with following description:

A new command line option -ssl_dist_optfile has been
added to facilitate specifying the many options needed
when using SSL as the distribution protocol.

I couldn’t find any interesting resource that would answer my question if is it possible to use it to secure my described cluster? If yes how to configure it with Elixir 1.9 release?
I need to exclude using of SSH port tunnels because clients are installing this apps by docker-compose and with just run command and they don’t have technical skills to configure SSH to pass communication over bastion host etc.
Are there some other options to achieve my goal?

dch · July 24, 2019, 9:45am

A few things:

never ever ever expose epmd & your node to the internet. Opinions vary about whether your phoenix/cowboy/… app can be directly accessible, but personally I leave TLS to haproxy or similar tools, do basic route and denial-of-service handling (max connections/second etc) there, and then spread that out across my OTP instances. I also use a firewall to handle blocking really bad actors where haproxy is not sufficient - I then distribute that blocklist across all nodes.
I restrict the range of inter-node ports that the VM can use, and via firewall restrict that range to only permitted nodes. This may require that non-trusted nodes need to bridge via ssh to get in, but I consider direct node access something that users should never get, have, nor need.
erlang cookies are not a particularly big space, and can be brute-forced in parallel - there is no ratelimiting nor any other significant protection. By default the auto-created erlang cookie is a 20 byte sequence of upper-case ASCII characters so choose your own, and make it longer with a wider key space. Leave your epmd listening only on a VPN or private network if you must.
In my specific case, all BEAMs are connected to an internal IPv6 only mesh vpn and this means that any user has to auth to the vpn first. My haproxy instances can bridge the real world to the vpn world, across any physical node.
This makes connecting via remsh like this from a permitted node viz #2:

$ ERL_ZFLAGS='-proto_dist inet6_tcp -kernel inet_dist_listen_min 4370 -kernel inet_dist_listen_max 4400' \
    iex  --cookie (ssh koans@i09 cat .erlang.cookie) \
    --name console@wintermute.zt.koans \
    --remsh zen@i09.zt.koans

TLS only encrypts the traffic between nodes unless you verify certificates. In my experience, TLS has been a huge pain in the butt to manage. I don’t like it, and use mesh vpn or spiped to avoid it. https://www.erlang-solutions.com/blog/erlang-distribution-over-tls.html
As a final point, there is no particular need to use the same cookie on all nodes if your app doesn’t require the mesh. You can manually initiate remsh connections from your control node, using Node.set_cookie/2 and then Node.connect/1 will use a per-cookie node.

Finally, I rarely need console access once apps are deployed to production, so I’m curious what drives this requirement. Do you use distributed erlang layer at all? If not, don’t configure it, and just use ssh port forwarding to deal with any nodes that need some remsh love.

benwilson512 · July 24, 2019, 3:00pm

You can also solve this at the network layer by leveraging AWS VPCs, or the equivalent thereof in your cloud of choice. We don’t allow public internet traffic to hit our application servers at all. Public internet traffic hits load balancers, which route traffic on HTTP / HTTPs ports to the nodes.

@bartekupartek is using kubernetes so this is probably already how he’s doing things. For remsh look into kubectl exec.

NOTE: this does require that you trust your private network. If you’re a small to medium sized company this may be reasonable. If you’re large and doing multi-tenant stuff, you have to lose this assumption and do more complex stuff.