suppose a case where single ets is storing users and users start to grow too much in ets table. now i want to shard ets table in the same node means local node.
is it possible? and if it is possible, how?
note - Don’t use 3rd party library. just custom built.
it is just an assumption, do you know, discord also store their guild members in ets table, and when it starts to too much like 250k then he have distribute into small chunk of tables. now i am asking same scenaruo what if users start to grow then it is possible to create small small table of same ets table?
yay, Registry or GenRegistry also create local ets table in each node if we start it in main supervisor. but here i am telling about a chat group like discord which handle that much guild members in each ets table. i am talking about breaking the big ets table into small small ets table in same node by sharding not in different node.
ets sharding is used not to increase size, it is used to increase speed of parallel lookups/updates to reduce lock contention. Do not use ets sharding unless you have a reason to do so.
i saying the same thing i want to increase the speed of parallel lookup because hypothetically if one ets table contains 250k users data which contains many complex fields then each time if i have to lookup or update then it will become problematic. so, i want to use sharding for breaking ets table(250k users) into small small ets table in same node.
Did you measure your lock contention on ets table? Did you measure your access time? Did you optimize your access patterns? Did you tune *_concurrency flags? Unless you do all of this, you have no reason to shard tables. ETS is build with concurrency in mind and it has pretty smart lock system which allows parallel access to separate chunks of the table, so sharding is required only when you have some exotic access pattern and you can guarantee on your level that the chunks you’ll split the table into will increase performance
This question doesn’t really work as a hypothetical. Sharping ets is a tactical performance optimization, the details have to be married to how it is actually used.
i measure acess time and read_concurrency and it gets slower when more process like 50k process tried to connect 1 genserver for lookup from 1 ets table which contains 250k users. by using update_counte to limit number then aldo it gets slower and slower.
now i think i have no option but to use sharding, what do you think?
client sending request to genserver but i know if this much try to send request to 1 genserver then it’s queue will explode so i am using update_counter to limit client to send request at time. but problem lies here when ets table grow to 250k(i am giving a number but actually it is 1 million) users details as tuple. then it is very slow to lookup if i set read_concurrent true then also it is very slow to lookup or update.
I think you should take a step back and ask yourself why you’re having this problem to begin with.
Do you require that both reads and updates are executed sequentially?
If you don’t require that at all, then the server process is pointless and you’re much better off letting the clients access the table directly, which will drastically improve performance.
If you only require that updates run sequentially, then the server process is only necessary for updates and reads can access the table directly, which will greatly improve performance.
If you require that both reads and updates run sequentially, then I’d love to hear why. Either way you need to take a long hard look at your application’s architecture as routing this much traffic through a single process is not normal.
The concurrency options generally make the table faster for concurrent operations at the cost of making each individual operation slower. If there are no concurrent operations, such as in your case with a single server process being the only one to touch the table, then they make things slower for nothing.
here we are talking about distributed system.
suppose here A is client genserver and B is ets table genserver.
we require B because for B ets table is in local node and he can retreive data but if i allow all client to direct acess table by using :rpc.call then it will increase overhead. so for that i am using two genserver and rereive data or update data.