Storing all genserver pid in ets table in each node

Erynn · January 8, 2024, 9:00am

if i have a thousands of nodes forming cluster where nodes are connected manually forming a ring(node1 to 2, 2 to 3, 3 to 4… last to 1node).

if all nodes have 1 genserver registered with their module. and if any genserver fail then they again restart and in handle_continue, they use Enum.each(Node.list, fn node → GenServer.cast({MODULE,node}, (:presence, {Node.self, self})) and when each node receive in hansle_info they store this data in ets table in each node. then they send their Node name with pid by using Keenel.send. by doing this each table contain all nodes genserver pid?

is it good idea and better in performance?

Awlexus · January 8, 2024, 9:16am

It sounds to me like you are reimplementing process groups. Using those should be easier and possibly more performant, since not every process in the group would need to create its own ETS table

Erynn · January 8, 2024, 9:22am

will my implimentation work for 1000 nodes?

Awlexus · January 8, 2024, 10:21am

It should “work” in theory, but I don’t have 1000 nodes to test this on. Ideally, you’d benchmark this for a realistic scenario you want to handle and then decide for yourself if you need to improve performance to ensure the system operates within your operations.

benwilson512 · January 8, 2024, 2:13pm

You’ve made a few posts about this node structure, can you elaborate about this? This is a very large number of nodes, have you looked at Partisan? Can you discuss why you chose that architecture?

Erynn · January 8, 2024, 5:31pm

because by default, it(distributed erlang) had created a fully meshed network. so i set connect_all false and then use a key value database to assemble meshed network.

benwilson512 · January 8, 2024, 6:11pm

Hey @Erynn my goal was to understand why you have 1000 nodes at all, I understand that distributed erlang doesn’t work for that. Partisan however isn’t distributed erlang, so I’m curious if you evaluated that since it is designed for large erlang clusters.

Erynn · January 8, 2024, 6:51pm

1000 nodes for a chat system.

benwilson512 · January 8, 2024, 6:56pm

@Erynn I don’t understand how you expect to get quality answers without asking a quality question. WhatsApp ran a multi billion user chat app on < 50 servers so you’re gonna need to use multiple sentences and expand on your thinking here.

Erynn · January 8, 2024, 7:05pm

1000 nodes for a chat system. now i am trying to scale it for more nodes, so now i am trying to sync ets table to another node but for that i have to know that node pid, for that i am registering 1 genserver per node. and i am trying to keep 1 ets table per node where all nodes genserver pid is there. for that i am thinking, if 1 genserver fail means that ets table gone, then genserver registered and use Enum.each(Node.list, fn node → GenServer.cast({MODULE , node}, {:presence, {Node.self, self}}) and when each genserver module receive in handle_info and insert in that node ets table and then they use Kernel.send to send their {Node.self, self} to back. and if all nodes pid is there in each ets table then we can use another genserver to sync ets table in that node?

benwilson512 · January 8, 2024, 10:13pm

This is a restatement of your question, not an explanation for why a 1000 node cluster was the best way to solve your problem.

To be more direct: I think using 1000 erlang nodes is a bad idea for almost any system, and particularly when structured in a loop. I don’t think there are satisfying answers to the questions you are asking, because the questions you are asking involve a cluster architecture that is fundamentally bad.

If you can talk to us about the larger picture problems you are trying to face then perhaps we could provide some input. Otherwise I think this thread is at an impasse.

See: https://xyproblem.info/

lud · January 8, 2024, 10:46pm

Are your trying to build it so each user has its own node?

RudManusachi · January 8, 2024, 10:56pm

The only scenario of 1000 nodes I could think of that would make sense is some sort of an IoT… with 1000 devices each running erlang and all connected to a cluster for some reason

sbuttgereit · January 8, 2024, 10:59pm

Kinda what I was thinking, too. The the asker may well be conflating the roles of “node” and “process”.

Erynn · January 8, 2024, 11:00pm

ok i am explaining,

if i do not use 1000 nodes in ring then it can lead to increased overhead and complexity.
i choose this architecture for fault tolerance and scalability reason. because i am lazy sorry😁

when cluster have less nodes, we are using mnesia and it is working but when we started to add more nodes, we started to see mnesia do nothing but sit silently, write become very very difficult, now i am thinking to migrate to ets table because it is fast and very fast to write but the main problem is, ets table is for local nodes. so, we have to sync ets table in another node. so we are using consistant hashing to find nodes to sync. and we are using genserver to sync. the main problem is how we know pid of that node genserver to send ets table by using Kernel.send.

so i thought to create a local ets table in each node and keep all node genserver pid.

my main problem is, if one genserver crash and restart genserver again then it get new pid. will this GenServer.cast can send 1000 nodes in one go?

Enum.each(Node.list, fn node → GenServer.cast({Sync.GenServer, node}, {:presence, {Node.self, self}}))

Erynn · January 8, 2024, 11:02pm

100 of millions of users is very hard to handle with less node

sbuttgereit · January 8, 2024, 11:03pm

And yet…

Erynn · January 8, 2024, 11:05pm

whatsapp use 16000 nodes.
correct your data.

Erynn · January 8, 2024, 11:07pm

https://twitter.com/colrack/status/1192408832623947776

zoom image and see.

sbuttgereit · January 8, 2024, 11:14pm

This is better link to the same sort of thing (and same subject) that you’re linking.

The reality is if I find myself needing those sorts of node counts, I probably need someone already highly versed, skilled, and experienced with BEAM internals and tuning to get to the right architecture.

I wish you the best.