Why does Registry use set and duplicate bag

qzg · November 23, 2021, 11:53pm

This is just a general question. I didn’t want to post an issue on the Elixir GitHub, but why does Registry use the set and duplicate bag datatypes in ETS and not ordered_set. I was reading this article which looks like ordered_set is pretty useful, especially for high-concurrency structures.

I’m interested in using the Registry for a high-concurrency use case so this came up. I’m not complaining about the current performance about Registry, but I don’t think I’ve really pushed it yet either.

Thanks!

ityonemo · November 24, 2021, 12:41am

I think registry needs bag functionality for it’s use as a PubSub system. My guess about set vs ordered_set is for compatibility with people using older OTPs… It might be interesting to copy/paste the Registry code and do a performance analysis on more recent OTP, it would certainly be possible to make a compile-time choice in Elixir registry to default to one or the other depending on OTP version and if there is a difference it would likely be a welcome PR to elixir STDLIB

al2o3cr · November 24, 2021, 12:54pm

One thing to keep in mind: the performance improvements described in that article are between the old ordered_set and the new ordered_set.

I haven’t benchmarked it, but I’d guess that un-ordered ETS tables are still way faster; maintaining order in the presence of concurrent writes is usually tricky.

ruslandoga · November 24, 2021, 1:03pm

I’m sure elixir team benchmarked different approaches and picked the fastest one for their use-case.

Below is a naive benchmark for inserts and lookups, I’m not taking into account concurrent reads/writes and each function is benchmarked in a single process (AFAIK). I’m also not starting a table per scheduler (in my case I’d use 8 tables of each type) and :erlang.phash2/2 the key to pick the table to insert into / lookup from.

Personally I mostly use ordered_set for range-readable indexes (since ordered_set is a tree) that point to a record in a bag that actually holds the data. Since Registry doesn’t need to iterate the keys in order, it doesn’t need to use ordered_set.

bench_ets.exs

Mix.install([:benchee])

set = :ets.new(:set, [:set, :public])
ordered_set = :ets.new(:ordered_set, [:ordered_set, :public])
bag = :ets.new(:bag, [:bag, :public])
duplicate_bag = :ets.new(:duplicate_bag, [:duplicate_bag, :public])

Benchee.run(%{
  "set insert" => fn -> :ets.insert(set, {:crypto.strong_rand_bytes(3), <<38, 142, 150, 5, 201, 91, 240, 163, 2, 62>>}) end,
  "ordered_set insert" => fn -> :ets.insert(ordered_set, {:crypto.strong_rand_bytes(3), <<38, 142, 150, 5, 201, 91, 240, 163, 2, 62>>}) end,
  "bag insert" => fn -> :ets.insert(bag, {:crypto.strong_rand_bytes(3), <<38, 142, 150, 5, 201, 91, 240, 163, 2, 62>>}) end,
  "duplicate_bag insert" => fn -> :ets.insert(duplicate_bag, {:crypto.strong_rand_bytes(3), <<38, 142, 150, 5, 201, 91, 240, 163, 2, 62>>}) end,
  "control" => fn -> {:crypto.strong_rand_bytes(3), <<38, 142, 150, 5, 201, 91, 240, 163, 2, 62>>} end
})

ms = [{{:_, :_}, [], [true]}]

IO.puts("Counts:")
IO.inspect([
  set: :ets.select_count(set, ms),
  ordered_set: :ets.select_count(ordered_set, ms),
  bag: :ets.select_count(bag, ms),
  duplicate_bag: :ets.select_count(duplicate_bag, ms)
])

:ets.delete_all_objects(set)
:ets.delete_all_objects(ordered_set)
:ets.delete_all_objects(bag)
:ets.delete_all_objects(duplicate_bag)

for tab <- [set, ordered_set, bag, duplicate_bag] do
  Enum.each(1..100_000, fn i -> :ets.insert(tab, {i, i}) end)
end

Benchee.run(%{
  "set lookup" => fn -> Enum.each(1..100_000, fn i -> :ets.lookup(set, i) end) end,
  "ordered_set lookup" => fn -> Enum.each(1..100_000, fn i -> :ets.lookup(ordered_set, i) end) end,
  "bag lookup" => fn -> Enum.each(1..100_000, fn i -> :ets.lookup(bag, i) end) end,
  "duplicate_bag lookup" => fn -> Enum.each(1..100_000, fn i -> :ets.lookup(duplicate_bag, i) end) end,
  "control" => fn -> Enum.each(1..100_000, fn i -> {i, i} end) end
})

> elixir bench_ets.exs

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 8 GB
Elixir 1.12.3
Erlang 24.1.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking bag insert...
Benchmarking control...
Benchmarking duplicate_bag insert...
Benchmarking ordered_set insert...
Benchmarking set insert...

Name                           ips        average  deviation         median         99th %
control                  1685.13 K        0.59 μs  ±1066.71%           1 μs           1 μs
bag insert                880.97 K        1.14 μs   ±657.97%           1 μs           2 μs
duplicate_bag insert      844.37 K        1.18 μs   ±480.45%           1 μs           2 μs
set insert                841.37 K        1.19 μs   ±511.06%           1 μs           2 μs
ordered_set insert        477.47 K        2.09 μs   ±463.39%           2 μs           3 μs

Comparison:
control                  1685.13 K
bag insert                880.97 K - 1.91x slower +0.54 μs
duplicate_bag insert      844.37 K - 2.00x slower +0.59 μs
set insert                841.37 K - 2.00x slower +0.60 μs
ordered_set insert        477.47 K - 3.53x slower +1.50 μs

Counts:
[set: 4210616, ordered_set: 2842773, bag: 4343634, duplicate_bag: 4919200]

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 8 GB
Elixir 1.12.3
Erlang 24.1.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking bag lookup...
Benchmarking control...
Benchmarking duplicate_bag lookup...
Benchmarking ordered_set lookup...
Benchmarking set lookup...

Name                           ips        average  deviation         median         99th %
control                     353.35        2.83 ms     ±3.51%        2.81 ms        3.03 ms
set lookup                  146.83        6.81 ms     ±3.15%        6.77 ms        7.75 ms
duplicate_bag lookup        133.99        7.46 ms     ±1.40%        7.46 ms        7.84 ms
bag lookup                  132.82        7.53 ms    ±12.33%        7.35 ms        9.93 ms
ordered_set lookup           76.62       13.05 ms     ±2.41%       13.02 ms       14.22 ms

Comparison:
control                     353.35
set lookup                  146.83 - 2.41x slower +3.98 ms
duplicate_bag lookup        133.99 - 2.64x slower +4.63 ms
bag lookup                  132.82 - 2.66x slower +4.70 ms
ordered_set lookup           76.62 - 4.61x slower +10.22 ms