Relying on :erpc for networked nodes

drewble · January 12, 2024, 6:41pm

With several Phoenix apps clustered with libcluster, it’s easy to make calls to remote nodes via :erpc.

Say I have two different public-facing apps, and one needs to fetch data from the other to serve requests. Is it a reliable pattern to use :erpc for this relationship? Are there known limitations or performance hits for concurrent requests over :erpc?

D4no0 · January 12, 2024, 7:07pm

To be fair I’m not a big fan of doing rpc calls without a predefined contract, that can be enforced on both ends, it is just too easy to break the contract by accident.

IMO a monolith that has good separation of code will be times more easier to manage than those 2 separate services trying to communicate via rpc.

drewble · January 12, 2024, 7:34pm

it is just too easy to break the contract by accident

The plan is to define a tested and documented API for use by :erpc calls for exactly that reason. We have domain reasons for maintaining separate apps.

Architectural concerns aside, are there known limitations or performance hits for concurrent requests over :erpc?

D4no0 · January 12, 2024, 7:39pm

Have you read the introduction paragraph for erpc? I guess this is an important pointer that is presented there: Erlang -- Processes

tj0 · January 12, 2024, 8:54pm

I’ve only used :rpc (or software that used rpc, erpc looks like a newer version) and I’ve never really hit limits on this. The situation was:

Everything was in same dc(no wan)
Everything was trusted.
Less than 15 nodes connected (other network topologies exist that can connect more, but it was unnecessary as boxes wese beefy).

I don’t think there’s any reason not to use it? The alteratives aren’t that great either. Spending cpu serializing/deserializing json over http? After seeing 25% of the cpu going to parsing, deciding to go to a binary format like messagepack? As long as everything is trusted, single datacenter, and no reason to talk to non-BEAM languages, why not use the standard protocol? For instance, I never had any issues with riak clusters and I can’t think of any application that would be more chatty.

Then on modern hardware, I don’t think it will be very easy to hit the limits. Anyway, for further information:

http://erlang.org/pipermail/erlang-questions/2016-February/087698.html

Personally, I would be more concerned about security and the introduction of other languages than the scale in most situations.

drewble · January 16, 2024, 10:20pm

@D4no0 I have read the introduction paragraph. I have also read the note about blocking signals, although I can’t say I totally understand it.

@tj0 there is a line in the introduction to :erpc that says it “has better performance and scalability than the original rpc implementation” but with no implementation details. Based on the paper you linked, it appears even the original :rpc would be scalable enough for our current needs. It’s good to know that Node.spawn/4 is there if we do hit some of those bottlenecks.

drewble · April 3, 2024, 7:25pm

Looking into this more, erpc is significantly different than rpc. The issue that is commonly raised is that the original rpc implementation relied on a single GenServer process named rex to handle rpc calls. This causes performance degradation under high call volumes due to mailbox flooding. The charts in the DE-Bench PDF above indicate that using spawn instead of rpc is not subject to the same limits, as it spins up a new process per-call.

erpc basically builds the alternative approach into the Erlang kernel, using spawn_request to make that actual request to the remote node as mentioned in the Kernel 7.0 release notes and found in the Erlang erpc implementation. rex is entirely absent from the :erpc implementation, and rpc itself now relies on erpc for many operations, likely for this reason.

The theoretical limitations of erpc may be the port blocking described in the gen_rpc library README. For the purposes of using erpc to make intra-cluster API calls with reasonably sized response payloads, this is probably not an issue.

julismz · February 18, 2025, 4:14pm

You could face this with structs or even with Apache AVRO’s schemas, couldn’t you?