Storing all genserver pid in ets table in each node

dimitarvp · January 8, 2024, 11:37pm

Again and again, you’re only saying what you are aiming to do and never once said why you are doing it.

Why were you using distributed Mnesia? What was wrong with a traditional database?

And what kind of app has 100s of millions of users? Are you working for Telegram?

You don’t seem to understand – we need to know what exact problem are you trying to solve. You already decided you’ll use 1000+ nodes and are asking how to do it. But we want to know why do you think you need those nodes.

Until you try to explain your needs and NOT repeat how you imagine your problem must be solved then nobody can help you.

Erynn · January 8, 2024, 11:38pm

you can’t expect high skill and experience from 17 years old boy.

Erynn · January 8, 2024, 11:48pm

see i am working for someone so i can’t say for whom i am working.

but i will explain what i am trying to do, see this application have group which contain something like 2 lakh people and and we are thinking to increase it. we are using mnesia to keep user session pid with user detail in mnesia to broadcast to all online users. but if we increase group size, then we know mnesia will not work, write and sync in mnesia is wrost when size grow so we are thinking of using ets table because it is fast.

the main problem is sync

dimitarvp · January 8, 2024, 11:59pm

Then ETS is not going to help you, as you discovered it’s a local node only.

sbuttgereit · January 9, 2024, 12:00am

(Deleted be me)

Tyson · January 9, 2024, 1:22am

I’m going to sidestep the political discussion, assume that 1000 nodes is a good approach, ignore that the ring connection strategy sounds questionable, and just address the question:

It seems to me you don’t need to maintain a PID lookup table on every node, you just need one central place that knows all the PIDs. This is like a “service discovery” strategy. So each node registers itself (or the PID of its ets-managing GenServer) at startup with this central registry, which could be an additional node, a standalone service/database, or whatever. It wouldn’t require anything exotic to handle requests from 1000 “clients” (the worker nodes), especially if requests are only fired at node startup.

Though it’s not clear to me whether you need to look up PIDs only to recover a failed node or if you have to look them up every time you want to sync any changed local data. If it’s the latter, you might overwhelm the service registry with lookups.

lud · January 9, 2024, 8:05am

Well that node would have to be connected to the 999 others, not sure if itis prnctical.

Now @Erynn what I do not understand is why you want to keep all the PIDs on all nodes if they are not connected together. Do you connect and disconnect if they need to talk?

Plus if you have 1 unique genserver per node that should always be here, you can register it with a local name, so you do not need the pid, you always know that on each node there is that process.

Maybe you should describe how you want to work with the node ring and how data transits along the ring.

Tyson · January 9, 2024, 12:59pm

Was just coming back to mention this

https://hexdocs.pm/elixir/1.15.7/GenServer.html#module-name-registration

Notice @Erynn that a GenServer started with start_link(..., name: MyLocalAtomName) is addressable with GenServer.call({MyLocalAtomName, :"my_node_999@my_host"}, ...) as long as the node names are known ahead of time.

You’re right, it shouldn’t be part of the BEAM cluster at all, but a standalone service, e.g., HTTP REST API.

lud · January 9, 2024, 2:45pm

You’re right, it shouldn’t be part of the BEAM cluster at all, but a standalone service, e.g., HTTP REST API.

Ah yes.

In that case a tool like zookeeper is already made for that exact use case I believe.

Erynn · January 9, 2024, 2:57pm

yay by using etcd

that’s the thing i am talking about but GenServer.call will increase overhead so to reduce overhead we can use GenServer.cast and and receiving end they use Kernel.send to send data back.

Tyson · January 9, 2024, 4:08pm

Ah, I see you did figure that out already. But then why bother updating PIDs on other nodes? You don’t need them if you can send messages ^this way.

If it were me, I would want to gather some benchmarking/profiling data before assuming I can outperform GenServer.call()

Erynn · January 9, 2024, 4:51pm

you are i can yse GenServer.cast to send directly but when you have to sync big table then it become costly, to reduce this, we will store all genserver node pid in each ets table and then it become easy to sync group table data in another node by using Kernel.send.

in simple language, Kernel.send is cheap than GenServer.

Erynn · January 9, 2024, 4:52pm

you are correct, i should have to benchmark all.

josevalim · January 10, 2024, 4:07pm

It is fine to connect 1000 nodes to each other. We benchmarked Phoenix with 1 million websocket connections and those are much heavier than just plain TCP connections.

IMO, focusing solely on the number of connections is the wrong way to approach the problem. The big question is which data will be sent over the connection and how often. So depending on your application and its communication patterns, getting up to 1000 nodes may be very easy or it may be very hard. For example: the trouble with large Erlang clusters is not really the number of connections but using modules like global, which requires all 1000 nodes to agree. I believe this is one of the reasons why the WhatsApp folks rewrote pg to not rely on global and it has much better scalability properties now.

Honestly, I personally find it hard to predict how a system with 1000 nodes will behave, without any detailed information or any real usage. Sometimes there are obvious bottlenecks, but most often they come from unexpected places that you only learn once you get to a certain scale. I’d prefer to first scale the system to 20 nodes and benchmark. Then 100 nodes and benchmark. The 500 nodes and then 1000. After all, you need real data, real use, and real traffic to make sure a system with 1000 nodes is behaving as expected.

deevil · January 10, 2024, 9:07pm

@Erynn

Nebulex might be a possible solution for you.

I use it to keep an in-memory cache synced between BEAM nodes without any issues, but I don’t have nearly 1000 nodes, lol.

schneebyte · January 11, 2024, 6:32pm

The only thing missing from this thread is someone saying “But Erlang was invented 100 years ago when it ran inside the same rack.”

I say give it a shot, see how it goes.
For mnesia there is GitHub - emqx/mria: Asynchronously replicated Mnesia-like database for Erlang/Elixir, maybe that could scale better.

Also maybe of interest - GitHub - ostinelli/syn: A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Discord is using Elixir, they have some great blog articles on how they are scaling things.