OTP in Unequal Trust environments

zeroexcuses · January 15, 2022, 1:18am

Defining unequal trust.

Consider three machines:

my_desktop = home desktop machine
ec2_node = node I am renting on EC2
bob_colo = node I am renting from @ Bob’s colo

My trust for these machines are my_desktop > ec2_node > bob_colo. In particular, my_desktop can ssh into ec2_node / bob_colo, but neither can ssh into my_desktop.

OTP, Distributed Erlang/Elixir

I believe that to take advantage of OTP in a multi-machine environment, we need to use Distributed Erlang/Elixir.

However, they have a different trust model from me – any machine can spawn any code on any node.

I don’t have that trust model: ec2_node and bob_colo should take commands from my_desktop; but my_desktop definitely should not take commands from ec2_node and bob_colo.

Question:

In an “unequal trust” environment, is it still possible to use OTP ?

hauleth · January 15, 2022, 6:57am

Use OTP - yes, use Distributed Erlang - not really. The question is whether you do not trust machine or network? Because with network we can handle, with machine not really. You will probably need to use different distribution protocol that will provide “firewalling” some actions.

zeroexcuses · January 15, 2022, 10:39am

I don’t trust the network, that is easy to solve (tunnel over ssh, vpc, etc …).

I do not trust ec2_node / bob_colo to issue spawn commands to my_desktop – as stated above, twice.

In theory, you are right. In practice, do you know of any pre-existing library that can be used ?

hauleth · January 15, 2022, 10:58am

I am not aware of any. You may need to write one.

benwilson512 · January 15, 2022, 5:36pm

Depends on what you mean by “use”. Any given node can of course use OTP constructs internally. If you want to do standard erlang distribution between these nodes however then no, that will not be safe. I’d fall back to more standard collaboration / communication protocols.

mpope · January 15, 2022, 7:50pm

Maybe you can do some type of client / server topology with Partisan GitHub - lasp-lang/partisan: High-performance, high-scalability distributed computing with Erlang and Elixir. ?

zeroexcuses · January 16, 2022, 8:22am

@benwilson512 @mpope : Unfortunately, it does appear the best that can be easily done is:

distributed erlang for all nodes in the same trust level
tcp for erlang nodes in different trust levels

Fundamentally, erlang cookies are a symmetric key, whereas we want an asymmetric trust model.

hauleth · January 16, 2022, 8:25am

Erlang cookies aren’t security measure at all, and you can have different cookies for different nodes.

The problem isn’t cookie, the problem is that as soon as you connect to the cluster, then all nodes in cluster can execute any code on any node within that cluster. And it is not only about executing Erlang code, but any code (as you can always open port on remote node).

zeroexcuses · January 16, 2022, 9:14am

This is known. See original post:

======

The point here is: the trust model we want to express is asymmetric. Erlang cookies are symmetric. Therefore Erlang cookies can not express the trust model we want to express. As stated:

slouchpie · January 16, 2022, 12:25pm

In “Programming Erlang” by Joe Armstrong, he writes:

Distributed Erlang applications run in a trusted environment—since any
node can perform any operation on any other Erlang node, a high degree
of trust is involved. Typically distributed Erlang applications will be run
on clusters on the same LAN and behind a firewall, though they can run
in an open network.

slouchpie · January 16, 2022, 12:30pm

You could probably use Whonix/Qubes to run the “home” Erlang node in an isolated VM. Then you wouldn’t have to trust it.

max-au · January 20, 2022, 6:06pm

At the moment, Erlang Distribution security model is “perimeter-based security”. That is, as soon as malicious agent got inside your perimeter (controls any node in the cluster), all nodes are compromised.

There is an amount of work done to fix this issue. @potatosalad is probably one of the best positioned experts on this topic.

potatosalad · January 31, 2022, 10:59pm

For those interested, I put together a post about a prototype I put together related to this topic: RFC: Erlang Dist Security Filtering Prototype - Chat / Discussions - Erlang Programming Language Forum - Erlang Forums

@zeroexcuses In short: if the above prototype makes it into Erlang/OTP, it would be possible to have filters in place such that my_desktop can issue commands to ec2_node and bob_colo, but commands issued to my_desktop would be rejected.

al2o3cr · February 1, 2022, 1:40am

How do things like GenServer.reply from low → high trust work in that setup? My very naive reading of the docs is that you’d need to set the filters on the trusted side to allow the reply through, but what stops the untrusted server from sending something evil (like a code_change system message) followed by the expected {[:alias | ref], payload} message?

potatosalad · February 1, 2022, 3:55pm

@al2o3cr It will depend on the specific requirements in a given setup, but here are a few potential solutions:

Trusted proxy per call (Reject send, Accept alias_send) — This might involve using either {via, Module, ViaName} (or writing your own version of call) to spawn a proxying process on the Trusted node, send the request to the Untrusted node, and receive the reply from the alias within the proxy process (ignoring any non-matching messages received, including system style messages). This can also be accomplished by using something like: gen_statem:call(UntrustedServer, Request, {clean_timeout, Timeout}). The Trusted node might reject all send from the Untrusted node and only accept alias_send in this setup.
Trusted registered process, Untrusted proxy (Reject all, Accept specific reg_send) — Depending on how this setup, this could break vanilla use of GenServer.reply, but would make it so all replies from the Untrusted node would need to be sent via a registered process on the Trusted node. A variation of this might involve the Trusted node using spawn_request to spawn a proxy on the Untrusted node and then redirect calls through this proxy which would handle the local calls on the Untrusted node. This would allow GenServer.reply to continue functioning like normal on the Untrusted node, but would require some extra steps on the Trusted node to get everything setup correctly.
Trusted spawn_request handler, Untrusted proxy (Reject all, Accept specific spawn_request) — This might be considered a variation of (2), especially when paired with the idea of proxying calls using a process running on the Untrusted node. The idea being that the only form of communication Untrusted is allowed to do with Trusted is by trying to call spawn_request on the Trusted node, which may be filtered or have a custom handler in place to reject anything unwanted.