Again and again, you’re only saying what you are aiming to do and never once said why you are doing it.
Why were you using distributed Mnesia? What was wrong with a traditional database?
And what kind of app has 100s of millions of users? Are you working for Telegram?
You don’t seem to understand – we need to know what exact problem are you trying to solve. You already decided you’ll use 1000+ nodes and are asking how to do it. But we want to know why do you think you need those nodes.
Until you try to explain your needs and NOT repeat how you imagine your problem must be solved then nobody can help you.
see i am working for someone so i can’t say for whom i am working.
but i will explain what i am trying to do, see this application have group which contain something like 2 lakh people and and we are thinking to increase it. we are using mnesia to keep user session pid with user detail in mnesia to broadcast to all online users. but if we increase group size, then we know mnesia will not work, write and sync in mnesia is wrost when size grow so we are thinking of using ets table because it is fast.
I’m going to sidestep the political discussion, assume that 1000 nodes is a good approach, ignore that the ring connection strategy sounds questionable, and just address the question:
It seems to me you don’t need to maintain a PID lookup table on every node, you just need one central place that knows all the PIDs. This is like a “service discovery” strategy. So each node registers itself (or the PID of its ets-managing GenServer) at startup with this central registry, which could be an additional node, a standalone service/database, or whatever. It wouldn’t require anything exotic to handle requests from 1000 “clients” (the worker nodes), especially if requests are only fired at node startup.
Though it’s not clear to me whether you need to look up PIDs only to recover a failed node or if you have to look them up every time you want to sync any changed local data. If it’s the latter, you might overwhelm the service registry with lookups.
Well that node would have to be connected to the 999 others, not sure if itis prnctical.
Now @Erynn what I do not understand is why you want to keep all the PIDs on all nodes if they are not connected together. Do you connect and disconnect if they need to talk?
Plus if you have 1 unique genserver per node that should always be here, you can register it with a local name, so you do not need the pid, you always know that on each node there is that process.
Maybe you should describe how you want to work with the node ring and how data transits along the ring.
Notice @Erynn that a GenServer started with start_link(..., name: MyLocalAtomName) is addressable with GenServer.call({MyLocalAtomName, :"my_node_999@my_host"}, ...) as long as the node names are known ahead of time.
You’re right, it shouldn’t be part of the BEAM cluster at all, but a standalone service, e.g., HTTP REST API.
that’s the thing i am talking about but GenServer.call will increase overhead so to reduce overhead we can use GenServer.cast and and receiving end they use Kernel.send to send data back.
you are i can yse GenServer.cast to send directly but when you have to sync big table then it become costly, to reduce this, we will store all genserver node pid in each ets table and then it become easy to sync group table data in another node by using Kernel.send.
in simple language, Kernel.send is cheap than GenServer.
It is fine to connect 1000 nodes to each other. We benchmarked Phoenix with 1 million websocket connections and those are much heavier than just plain TCP connections.
IMO, focusing solely on the number of connections is the wrong way to approach the problem. The big question is which data will be sent over the connection and how often. So depending on your application and its communication patterns, getting up to 1000 nodes may be very easy or it may be very hard. For example: the trouble with large Erlang clusters is not really the number of connections but using modules like global, which requires all 1000 nodes to agree. I believe this is one of the reasons why the WhatsApp folks rewrote pg to not rely on global and it has much better scalability properties now.
Honestly, I personally find it hard to predict how a system with 1000 nodes will behave, without any detailed information or any real usage. Sometimes there are obvious bottlenecks, but most often they come from unexpected places that you only learn once you get to a certain scale. I’d prefer to first scale the system to 20 nodes and benchmark. Then 100 nodes and benchmark. The 500 nodes and then 1000. After all, you need real data, real use, and real traffic to make sure a system with 1000 nodes is behaving as expected.