Sharing single API connection from multiple nodes w/ failover

mneumann · August 19, 2024, 10:55am

Hi,

I’ve three mostly identical nodes in an embedded system and roughly every minute, all of them have to make an API call to fetch some more information, and usually they do this pretty much simultaneously with the same query parameters (i.e. exact same HTTP GET query). To save bandwidth, I’d like to designate a single node responsible for performing the HTTP API calls and cache the responses for some short period of time (in case some nodes do the same call 10 seconds later), so that I have to fetch the data only once and not three times.
Unlike a centralized HTTP caching proxy, I also want that when the responsible node goes down, another node is selected to do this job.

How would you do that?

Would you use a library like Elector or swarm for that purpose?

These API calls are done once every minute, so performance really doesn’t matter that much.

How about this simple scheme:

When a node needs to make an API call, it first sends an RPC to node 1. If node 1 does not acknowledge within a short period of time, lets say 1 second, that it has received the request, the node continues to send a request to node 2…and so on.

So if I remove one node from the network, the RPC call will just time out and the next node is tried. Dunno if the Erlang VM will buffer the RPCs and retry once the network in on again…

Or would you use Erlang’s global module? Each node making an API call would first check if the global name “fetch_api” exists. If it doesn’t exist, which either means, we just started up the system or the node that had registered the name previously went down,
it would call :global.register and register itself and then just send a message to the registered “fetch_api” pid.

I think the simplest solution would really be, that each node has it’s supervised gen_server that can fetch and cache the API, regardless of if it’s used or not, just to make the code simpler. The client code of the gen_server would get the pid from a globally registered name OR register it’s own supervised gen_server in case none is found.

That really sounds like 2-3 lines of Elixir code

Anyone tried similar things? Are there some pitfalls with that approach? When I just poweroff a node by pulling the plug or network cable, how fast would Erlang notice that the node is down?

Note that, the API calls only GET data. I am also not concerned of net-splits. And normally, all three nodes should be up and running. In case one node is down for longer periods, which is not a real scenario atm, I accept a slight increase in latency.

mneumann · August 19, 2024, 1:21pm

sorry to answer my own question… :global module is truly amazing!

I can run the following two lines to make an API call regardless on which node:

# if global name is already registered this returns :no (just ignore)
:global.register_name(:api_fetch, pid_of_local_gen_server)

:global.whereis_name(:api_fetch) |> send(request)

Plus, I need some Node.ping code running in the background in order to establish connections to nodes. Of course this does not retry a request in case the node is going down that curently processes a request. In my case, this does not matter, as it is very seldom and one minute later everything works as normal.