Genserver VS ETS

Context:

I was reading through Dockyard’s article on ETS. The blog post build a ratelimiter using a GenServer first and then replace it with ETS.

I was curious to see if I can replicate the same ratelimiter using DynamicSupervisor + GenServer. I have went ahead an implemented both versions (DynamicSupervisor and ETS).

Question:

  1. I curious to know how I can benchmark both the implementation?
  2. What’s the general idea regarding GenServer vs ETS?

Hi,

I made this search on google https://www.google.com/search?client=firefox-b-d&q=benchmark+genserver and got this:

https://hexdocs.pm/gen_metrics_bench/GenMetricsBench.html - library to test genservers

https://thoughtbot.com/blog/make-phoenix-even-faster-with-a-genserver-backed-key-value-store -article how to use cache and test

1 Like

The ets docs and lyse are good resources. The main benefit of ets is that it allows shared, concurrent access to data. If a process owns the data then other processes have to go through the owner to access that data. The data-holding process becomes a bottleneck on the system.

Standard benchmarks aren’t going to show you much here. What you’ll want to do is generate contention on the data by creating a lot of callers. Under enough load the process solution will begin to back up while the ets solution should remain relatively constant.

2 Likes

“If a process owns the data then other processes have to go through the owner to access that data.” Instead of having a single process be responsible for dealing with the data, what I decided to was to create multiple process which will be responsible for multiple data entry. And this is what I want to test against ETS.

I hope that makes sense :confused:

Yeah that makes sense. But lets assume that your data access follows a power law. One piece of data will need to be accessed much more than everything else. You’ll still end up with contention on a single piece of data. In either scenario the benchmark will be the same. Have multiple concurrent readers try to access data in a process and in ETS and compare their tail latencies (the worst 5% is typical).

If you assume that data access is uniform then using multiple processes will help to spread out the load. But most data access doesn’t follow a uniform distribution. Even if it did, accessing an ETS table is going to be faster. I threw together this gist to demonstrate. You can call the time/0 function on both of those modules and compare results.

2 Likes

Hi there! Be aware that reading data from ets has copy-on-read semantics. When combined with large binaries it can make for some ‘interesting’ behavior

This isn’t any different than reading from a process though. The data still gets copied. Persistent term avoids that but isn’t appropriate in all use cases.

Right, reading state from another process will copy that data in the message sent. For situations however where there is no need for another process to get hold of all the data (or none perhaps) it might make a difference, as data that stays local in a process is copy-on-write only