Is a singleton ETS a good idea?

Using libraries like singleton you can create applications which are the only one in the whole cluster. This application may be a GenServer which initializes an ETS table and delegates the requests to this ETS. It is like a thin wrapper to convert a local ETS into a global one.

I have this doubt because I read the following in the hammer documentation:

There may come a time when ETS just doesn’t cut it, for example if we end up load-balancing across many nodes and want to keep our rate-limiter state in one central store. Redis is ideal for this use-case, and fortunately Hammer supports a Redis backend.

If you need a shared cache for an elixir cluster, what do you prefer to use a Redis server or a singleton ETS or even a singleton cachex?

I can think advantages and disadvantages of both methods.

  • With singleton ETS or Cachex you can save Elixir structs or any erlang term, in redis you have to cast data from database to memory, unless you use very basic data structures like strings or numbers.
  • Singleton ETS won’t be as performant as a local one, but I think in most of the case could be enough. Maybe is it faster than Redis anyway?
  • With Redis you have to create a bigger wrapper to use it like cache.
  • You lose your cache if the node dies. This can be acceptable in some scenarios.
  • Less architectural dependencies.
  • Redis has built-in some interesting features like TTL.

I didn’t test it, because of that my question is: Is a singleton ETS a good idea or it has his own drawbacks? Or would you prefer Redis?

Sounds like it would create a single process bottleneck for reads.

Maybe mnesia would be a good fit for distributed cache state?

Yes, you are right, but If I’m not wrong, Redis is mono thread. I don’t know the performance of this bottleneck vs Redis, maybe it is still better, maybe not. Other question could be how to avoid this bottleneck and still making it global.

It seems like a more complex solution: Caching: ETS, Mnesia, Redis

The singleton linked is just a wrapper around the global module. It has its own set of problems, mostly that it can’t handle split brain scenarios and it is comparatively slow.

I don’t think you should write off mnesia based on that, it is not written to discourage but to show some things you may need to handle depending on use case. It is a great solution for distributed caching and the reasons behind these points is that there are well known “quirks” you may want to be aware of. Better the devil you know as they say. Most other distributed setups will have some set of problems but you may not be aware of them. If you go down the global over ets you may end up re-inventing mnesia.

This externalizes the problem. Instead of dealing with net-splits, high availability and performance in your BEAM cluster you needs an OPs team handling in externally. You may need to setup multiple redis nodes and either have a client being able to swap over to another node in case the master node goes down or setup a virtual IP to handle it which requires more infrastructure.

The question is why you need the distributed cache. Normally it is either because of performance or high availability. If it is for performance then anything cached on the local node will beat redis. If it is for high availability then it is either internal or external complexity.

If you don’t require the HA setup for redis you can just run a cache on a single node (with mnesia for example) and connect directly to the node-name. Then you have a similar setup as a single instance of redis.

2 Likes

Agree.

This has other problems. It is known that cache is a complex matter, if each node has its own cache could be even more complex:

  • Imagine a rate limiter like Hammer.Plug, you need the data to be centralized.
  • Or if you have “bearer token” sessions in this cache and an user deletes one. You should delete in every cache. (You can have an intelligent load balancer to distribute the request by user but this is OP complexity and it is not always possible)

HA is not my case.

That’s actually a very good idea. I’ll take a look. Anyway, mnesia seems more complex than ETS for a key value store.

My point is distributed cache is a complicated matter and there are many situations where a single cache for the whole cluster it is enough and much simpler.

1 Like

I was thinking about this, so I created a very simple benchmark where the GenServer reply to call asynchronously:

defmodule PerformanceTest do
  use GenServer

  @table :performance_test

  def benchmark(processes \\ 1_000) do
    list = Enum.to_list(1..processes)

    sync_stream = Task.async_stream(list, fn i ->
      {:ok, PerformanceTest.sync_lookup(i)}
    end, ordered: false, max_concurrency: 10)

    async_stream = Task.async_stream(list, fn i ->
      {:ok, PerformanceTest.sync_lookup(i)}
    end, ordered: false, max_concurrency: 10)

    Benchee.run(%{
      "sync" => fn -> Enum.to_list(sync_stream) end,
      "async" => fn -> Enum.to_list(async_stream) end
    })
  end

  def start_link() do
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  end

  def sync_lookup(key) do
    GenServer.call(__MODULE__, {:sync_lookup, key})
  end

  def async_lookup(key) do
    GenServer.call(__MODULE__, {:async_lookup, key})
  end

  @impl true
  def init(_) do
    :ets.new(@table, [:named_table, read_concurrency: true])

    for i <- 1..10_000, do: :ets.insert(@table, {i, i*i})

    :ets.insert(@table, {:foo, "lorem_ipsum"})

    {:ok, nil}
  end

  @impl true
  def handle_call({:sync_lookup, key}, _from, _) do
    {:reply, :ets.lookup(@table, key), nil}
  end

  def handle_call({:async_lookup, key}, from, _) do
    spawn(fn ->
      GenServer.reply(from, :ets.lookup(@table, key))
    end)

    {:noreply, nil}
  end
end

However, the results are almost identical, so I suppose the bottleneck still exists, or the benchmark is not similar to a real situation:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Number of Available Cores: 4
Available memory: 7.69 GB
Elixir 1.6.4
Erlang 20.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s


Benchmarking async...
Benchmarking sync...

Name            ips        average  deviation         median         99th %
async         84.69       11.81 ms     ±8.80%       11.50 ms       15.41 ms
sync          82.09       12.18 ms    ±12.31%       11.97 ms       18.64 ms

Comparison:
async         84.69
sync          82.09 - 1.03x slower

This is very similar to the built-in :rpc mechanism, which also uses a single proxy process.

If you need to scale beyond that you can spawn processes directly on the remote node via :erlang.spawn/4.

The bottleneck isn’t in performing the ETS lookup, it’s in queueing all the requests in the mailbox of the GenServer process.

Compare with doing an ETS lookup directly in the client code and you should see a difference.

I have personally had some success with distributing ets actions across nodes using rpc.multicall and then all reads just happen on the local node handling the call. This obviously has many areas where it is not great like handling netsplit or servers going down and then coming back(missing writes/deletes) but depending on what you are using the ETS store for could be acceptable(a cache that falls back to hitting db if key not in cache).

This way you do not have a read bottleneck but your writes are slightly slower. I have also recently just not “cared” about the replication and ran it in a Task.start. If it works out great. If not it will be built up when that cache is hit.

For what its worth I originally used mnesia for distribution…but honestly its an absolute pain for dynamic node membership and elegantly handling netsplit so I looked for something simpler in my use case.

I actually added this pattern in yesterday to handle distribution in my auth solution.

Relevant distribution file:

You are right, just for the record I paste my results:

Name             ips        average  deviation         median         99th %
direct        105.27        9.50 ms     ±5.76%        9.40 ms       12.73 ms
sync           84.15       11.88 ms     ±3.55%       11.79 ms       14.10 ms
async          82.81       12.08 ms     ±4.19%       12.00 ms       14.42 ms

Comparison:
direct        105.27
sync           84.15 - 1.25x slower
async          82.81 - 1.27x slower

That’s what I thought at least for my case.

Very nice example, it seems much simpler that I thought at first.However, If I’m not wrong, your code provokes a loop infinite loop of calls:

Node A inserts and then it replicates to Node B. Node B inserts and calls again to Node A. If you have more than two nodes the thing is going to be worse, you multiply the calls every time :confounded:

It can be solved with a no replicate version of each function and being called only replicating (or something similar):

defmodule AccessPass.EtsDistributed do
  def insert(name, obj) do
    AccessPass.Ets.insert(name, obj)
    replicate(:insert, name, obj)
  end

  def insert_no_replicate(name, obj) do
    AccessPass.Ets.insert(name, obj)
  end

  defp replicate(:insert, name, obj) do
    Task.start(fn ->
      Node.list() |> :rpc.multicall(AccessPass.Ets, :insert_no_replicate, [name, obj])
    end)
  end
end

It looks like it at first if you quickly glance but you can see that the rpc calls forward to AccessPass.Ets. This file does not include the replication on calls.

^ is the file containing the functions RPC calls.

Yeah, you are absolutely right :smiley: I thought it was weird… :stuck_out_tongue:

1 Like

I found this benchmark which answers my own question, very interesting:

2 Likes

I do not think this is a good benchmark at all. It has all ets writes going through a single genserver process that basically becomes the bottleneck. If it did not send everything through a genserver I would be much more interested.

That’s exactly the point :slight_smile:

A “Singleton ETS” would be a GenServer for a whole cluster where data is saved only once. It is good because it is memory efficient and much simpler to update. It is bad because it is less fault tolerant and slower. This benchmark tells us how slow it is. I think, for a lot of people, it will be fast enough.

The problem is that this ignores some other things like the fact that you copy things twice - once from ETS and once between GenServer and the target process. Probably a better approach to testing things when running single-threaded would be to use ETS how you would normally use it, but start the Erlang VM with just one scheduler using --erl "+S 1:1".

@michalmuskala Is possible to retrieve data from a remote node ETS avoiding the copy duplication that you mention?

I understand that ETS is usually used for local node data, but I wanted to avoid the complexity of writing in multiple nodes and any incoherences that could happen in the process.

Sorry for the late reply.

I think what @jordiee tried to say is that in the benckmark, all ETS calls are going through the GenServer’s callbacks, effectively generating a bottleneck.

If instead of calling the GenServer you just call the ETS directly (eg. RPC call to the node who owns the ETS) you’ll get better performance, the GenServer will only be used to start and own the ETS.

I wrote a blog post about a toy distributed rate limiter in Elixir that better explains it: http://pggalaviz.com/2018/11/26/simple-distributed-rate-limiter-with-elixir/

2 Likes