Why use ETS? When not to use it?

My first question is what does ETS add on top of just storing state in a GenServer?

In the documentation I see it is used as a “cache”, does this mean it won’t always return the latest data?

There’s a warning in the Elixir guide about using it prematurely:

Warning! Don’t use ETS as a cache prematurely! Log and analyze your application performance and identify which parts are bottlenecks, so you know whether you should cache, and what you should cache. This chapter is merely an example of how ETS can be used, once you’ve determined the need.

I don’t have any performance issues at this time? Does this mean ETS should not be used? What are the downsides of using ETS?

7 Likes

No expert here – but from the reading I’ve done 90% of the time you’re better off just storing state in the GenServer.

There is a couple of interesting use cases since ETS tables have some access control:

public — Read/Write available to all processes.
protected — Read available to all processes. Only writable by owner process. This is the default.
private — Read/Write limited to owner process.

so you can have shared state across processes. I believe the Elixir Registry uses ETS tables under the hood.

The other use case I’ve seen is if you are highly concerned about Garbage Collection(GC) cycles the ETS table is not stored in the process’s heap. This Erlangelist blog article talks about saving GC latency when you have large data buffered in a process.

But these are all pretty big edge cases I think. Hope this helps!

10 Likes

Addressing your points directly,

what does ETS add on top of just storing state in a GenServer?

ETS is usually used as a way to share state between processes/GenServers.

does this mean it won’t always return the latest data?

No. ETS is a key-value store, it will be updated just as current as you update the data there.

I don’t have any performance issues at this time? Does this mean ETS should not be used?

Yes and no. It means that you dont need cache, it doesnt mean that you dont need ETS. Cache data is one of the many use cases for ETS.

4 Likes

Using ETS as a cache is only one of its many uses. A major use is for storing LARGE amounts of data in memory. This is a typical use, for example it is how mnesia stores data in memory. So if your gen_server stores large amounts of data then maybe using ETS is a better alternative.

As has already been pointed out ETS tables can be shared between processes. One thing to watch out is that ETS is a datastore not a database and the support for transactions is VERY limited. If you need to implement transactions you will probably need a process in front of the ETS tables.

There are many other uses of ETS. Another one for example is having supervisor use ETS tables to share data between its children, both current and future ones.

So in Erlang ETS is used quite often.

23 Likes

You can use ETS in many places where you use your database depending on your use cases. One easy win is putting all your lookup data into ETS tables. ETS is very fast when you do key lookups. I do a little bit of ETL and use ETS tables extensively to store lookup data, to write out to ets tables as an intermediate store etc,. I also use them in an image_downloader app to keep a list of URLs which have been downloaded with their checksums and headers so that I don’t need to download them the next time I see them. I use :ets.file2tab('./data/myetsdb.ets') as soon as I start the app and do a :ets.tab2file(:mydb, './data/myetsdb.ets') before I shut down or run into an error.

8 Likes

Hi, please check the order of this … should it not be the other way round?

1 Like

Yeah :slight_smile: . Fixed it

1 Like

You could simulate the complete interface of the :ets module with GenServer. However, such implementation would be very inefficient in some cases.

A typical example is in memory k-v. If you do this with a single process (GenServer or Agent), then all access is serialized. If you have thousands of client processes interacting with such k-v, operations are performed one at a time.

In contrast, an ETS powered table with proper knobs turned on (:public, :read_concurrency, :write_concurrency) will allow different clients to interact with the same table simultaneously. Multiple processes can issue reads and writes to the same table at the same time, and the operations can be performed simultaneously, unless both processes are writing to the same row.

An example of this in practice is Registry, which is powered by ETS tables. This ensures that client processes can quickly find desired processes, and even get some of their properties, without needing to message some process.

The thing @rvirding and @Nilithus mentioned about avoiding large process heap is also an interesting feature. ETS data is “off-heap”, so it doesn’t put any pressure on GC. If you have a single process with a large, and frequently changing, active memory set, ETS table might improve your performance significantly.

So basically, ETS is mostly an optimization technique. If your needs are simple, you can probably start with a GenServer, and consider ETS if you find performance problems.

26 Likes

Please correct me if I’m wrong, but it turns out that eg. info about Elixir modules is stored in ETS according to the source code.

1 Like

You are right (I am not sure what info about the modules is stored). There is a lot of information stored in ets tables. Simply opening iex will create ~19 ets tables.

iex(1)> :ets.all
[Logger.Config, IEx.Config, :elixir_modules, :elixir_config, :file_io_servers,
 :inet_hosts_file_byaddr, :inet_hosts_file_byname, :inet_hosts_byaddr,
 :inet_hosts_byname, :inet_cache, :inet_db, :global_pid_ids, :global_pid_names,
 :global_names_ext, :global_names, :global_locks, 4098, 1, :ac_tab]
iex(2)> :ets.all |> Enum.count
19
iex(1)> :ets.tab2list :elixir_modules
[]
iex(2)> :ets.tab2list :elixir_config 
[{:home, "/home/minhajuddin"}, {:erl_compiler_options, []},
 {:compiler_options,
  %{debug_info: true, docs: true, ignore_module_conflict: false,
    relative_paths: true, warnings_as_errors: false}}, {{:uri, "http"}, 80},
 {{:uri, "tftp"}, 69}, {:argv, []}, {:at_exit, []}, {{:uri, "sftp"}, 22},
 {{:uri, "ftp"}, 21}, {{:uri, "https"}, 443}, {{:uri, "ldap"}, 389}]

5 Likes

I choose ETS whenever I need to keep data despite of a GenServer process state. When a GenServer termination would imply relevant data loss. Specially due to the let it crash approach :wink:

Hey @jyeshe, do note that ets tables are tied to a specific process, so they will go away when that process does. Ets tables are not more permanent than genserver state.

3 Likes

Hi Ben, yes. My approach to that is creating the ETS out of the GenServer that is more vulnerable. Currently, most of ETS I use are created on Application initialization process. What would you think about that?

Here is an old article about handling ETS tables. It is in erlang but the principles applies:
https://steve.vinoski.net/blog/2011/03/23/dont-lose-your-ets-tables/

4 Likes

Personally I have used it as a cache and i can say that there is big advantage of that. Simply everytime the first request was directly hitting the database and then storing it in the ets table. If i remember correctly the retrieval from the database was taking about 200ms, but when it was retrieving it from the ets it was about 7ms.

2 Likes