Sharding of ets table

JILL24 · August 7, 2023, 11:49pm

suppose a case where single ets is storing users and users start to grow too much in ets table. now i want to shard ets table in the same node means local node.

is it possible? and if it is possible, how?

note - Don’t use 3rd party library. just custom built.

benwilson512 · August 8, 2023, 12:00am

How many users do you have? What are your access patterns? What problem are you running into with a single ets table?

JILL24 · August 8, 2023, 12:04am

it is just an assumption, do you know, discord also store their guild members in ets table, and when it starts to too much like 250k then he have distribute into small chunk of tables. now i am asking same scenaruo what if users start to grow then it is possible to create small small table of same ets table?

Nicoo · August 8, 2023, 6:07am

Good question.

I don’t know but here are some inspirations for a beginning of an answer:

Transparent and out-of-box sharding support for ETS tables in Erlang/Elixir. | Carlos Andres Bolaños
GitHub - cabol/shards: Partitioned ETS tables for Erlang and Elixir

LostKobrakai · August 8, 2023, 6:31am

Registry also uses sharded ets tables for scaling purposes.

JILL24 · August 8, 2023, 7:01am

yay, Registry or GenRegistry also create local ets table in each node if we start it in main supervisor. but here i am telling about a chat group like discord which handle that much guild members in each ets table. i am talking about breaking the big ets table into small small ets table in same node by sharding not in different node.

is it possible? i know it is possible but how?

hst337 · August 8, 2023, 8:09am

ets sharding is used not to increase size, it is used to increase speed of parallel lookups/updates to reduce lock contention. Do not use ets sharding unless you have a reason to do so.

JILL24 · August 8, 2023, 8:16am

i saying the same thing i want to increase the speed of parallel lookup because hypothetically if one ets table contains 250k users data which contains many complex fields then each time if i have to lookup or update then it will become problematic. so, i want to use sharding for breaking ets table(250k users) into small small ets table in same node.

hst337 · August 8, 2023, 8:45am

Did you measure your lock contention on ets table? Did you measure your access time? Did you optimize your access patterns? Did you tune *_concurrency flags? Unless you do all of this, you have no reason to shard tables. ETS is build with concurrency in mind and it has pretty smart lock system which allows parallel access to separate chunks of the table, so sharding is required only when you have some exotic access pattern and you can guarantee on your level that the chunks you’ll split the table into will increase performance

benwilson512 · August 8, 2023, 9:00am

This question doesn’t really work as a hypothetical. Sharping ets is a tactical performance optimization, the details have to be married to how it is actually used.

JILL24 · August 8, 2023, 9:00am

i measure acess time and read_concurrency and it gets slower when more process like 50k process tried to connect 1 genserver for lookup from 1 ets table which contains 250k users. by using update_counte to limit number then aldo it gets slower and slower.
now i think i have no option but to use sharding, what do you think?

benwilson512 · August 8, 2023, 9:07am

Are you routing all ets table access through a gen server or are the client pods allowed to access the ets table directly?

JILL24 · August 8, 2023, 9:27am

client sending request to genserver but i know if this much try to send request to 1 genserver then it’s queue will explode so i am using update_counter to limit client to send request at time. but problem lies here when ets table grow to 250k(i am giving a number but actually it is 1 million) users details as tuple. then it is very slow to lookup if i set read_concurrent true then also it is very slow to lookup or update.

jhogberg · August 8, 2023, 10:54am

I think you should take a step back and ask yourself why you’re having this problem to begin with.

Do you require that both reads and updates are executed sequentially?

If you don’t require that at all, then the server process is pointless and you’re much better off letting the clients access the table directly, which will drastically improve performance.

If you only require that updates run sequentially, then the server process is only necessary for updates and reads can access the table directly, which will greatly improve performance.

If you require that both reads and updates run sequentially, then I’d love to hear why. Either way you need to take a long hard look at your application’s architecture as routing this much traffic through a single process is not normal.

The concurrency options generally make the table faster for concurrent operations at the cost of making each individual operation slower. If there are no concurrent operations, such as in your case with a single server process being the only one to touch the table, then they make things slower for nothing.

benwilson512 · August 8, 2023, 10:56am

Are you conducting a synthetic benchmark or do you actually have a real application with 1 million users?

JILL24 · August 8, 2023, 11:44am

here we are talking about distributed system.
suppose here A is client genserver and B is ets table genserver.
we require B because for B ets table is in local node and he can retreive data but if i allow all client to direct acess table by using :rpc.call then it will increase overhead. so for that i am using two genserver and rereive data or update data.

JILL24 · August 8, 2023, 11:47am

is it possible to do sharding of one ets table(which contains 1 million users data in tuple) into multiple small small table in same node?

jhogberg · August 8, 2023, 12:21pm

Again, please take a step back and ask yourself: why have you designed the application this way?

JILL24 · August 8, 2023, 12:28pm

yes it is only way to handle this much traffic, it have many backend logic like hash ring etc.

my only simple question is this
Q. is it possible to sharding one big ets table into smaller smaller ets table in same node? if yes then how?

jhogberg · August 8, 2023, 12:56pm

I doubt that, but I’m not going to argue.

@Nicoo already posted a link that should be more than enough to get you started, so I don’t see what you’re on about. We’re just trying to help you.