:ets.select_count is very slow

:ets.select_count seems much slower than manual counting using :ets.next.
I filled the table (ordered set) with a million records with the keys {i, j}, where i and j are integers in the range from 1 to 1000 inclusive.
Then I counted the number of records with keys greater than {500, 0} and less than {599,: infinity} (100k records).

defmodule Test do
  def bench() do
    set = :ets.new(:set, [:set])
    ord = :ets.new(:ord, [:ordered_set])

    Enum.each(1..1000, fn i ->
      Enum.each(1..1000, fn j ->
        r = {{i, j}, "#{i}-#{j}"}
        :ets.insert(set, r)
        :ets.insert(ord, r)
      end)
    end)

    ms = [
      {
        {:"$1", :_},
        [{:andalso, {:>, :"$1", {{500, 0}}}, {:<, :"$1", {{599, :infinity}}}}],
        [true]
      }
    ]

    {set_time, 100_000} = :timer.tc(&:ets.select_count/2, [set, ms])
    {ord_time, 100_000} = :timer.tc(&:ets.select_count/2, [ord, ms])
    {ord_time_man, 100_000} = :timer.tc(&select_count/3, [ord, {500, 0}, {599, :infinity}])

    IO.puts("set         (select) - #{set_time}")
    IO.puts("ordered_set (select) - #{ord_time}")
    IO.puts("ordered_set (manual) - #{ord_time_man}")
  end

  def select_count(table, from, to) do
    select_count(table, :ets.next(table, from), to, 0)
  end

  def select_count(table, from, to, res) when from < to do
    select_count(table, :ets.next(table, from), to, res + 1)
  end

  def select_count(_table, _from, _to, res), do: res
end

Test.bench()

Results on my machine:

set         (select) - 208715
ordered_set (select) - 121615
ordered_set (manual) - 15120

Is there a way to speed up :etc.select_count?

Can you post the manual select_count code? Does it do something clever to not have to traverse the entire table? When I did a naive implementation of this the numbers turned out about the same for ets:select_count and my manual version:

select_count(Table, Min, Max) ->
    select_count_(Table, ets:first(Table), Min, Max, 0).

select_count_(_T, '$end_of_table', _Min, _Max, Cnt) ->
    Cnt;
select_count_(T, Key, Min, Max, Cnt) ->
    if Key > Min andalso Key < Max ->
            select_count_(T, ets:next(T, Key), Min, Max, Cnt+1);
       true ->
            select_count_(T, ets:next(T, Key), Min, Max, Cnt)
    end.
3 Likes

I already posted it. Take a closer look.
And yes it does. It uses :ets.next to jump directly to the first element you need.
It doesn’t work with a set, but perfectly works with an ordered set.

FYI, you’ll have to scroll within the snippet to see the select_count implementation. Depending on your OS you may not see a scroll bar (although I haven’t tested that yet since I configure my OS to always show a scroll bar).

But this is what it looks like for reference:


  def select_count(table, from, to) do
    select_count(table, :ets.next(table, from), to, 0)
  end

  def select_count(table, from, to, res) when from < to do
    select_count(table, :ets.next(table, from), to, res + 1)
  end

  def select_count(_table, _from, _to, res), do: res
2 Likes

My reading of the ETS docs is that :ets.select_count provides guarantees about traversal behavior that repeatedly using :ets.next (especially without safe_fixtable in a :set) does not.

Additional wild speculation (I have not measured this): calling :ets.next involves copying the key into the ETS table’s process and then copying the next key back into the caller’s process, while handling a match specification happens entirely inside the ETS process. Maybe there’s a performance win if keys are sufficiently large? (EDIT: now imagine the ETS process and the caller are not on the same node :scream: Then the keys are eating network bandwidth!)

ETS is not location transparent. So if you are reading across nodes you need to use explicit RPC calls.

1 Like

Ah, so there you go. The matchspec compiler is not clever enough to figure that out, nor do I think that it ever will be.

1 Like

That was indeed why I did not see it… what an odd UI design

2 Likes