Private ets tables and ERL_MAX_ETS_TABLES

I have processes (I almost dare not say, but they are agents, not genservers). In these agents I create two private ets tables:
bpmn_object = :ets.new(:bpmn_object, [:private, read_concurrency: true])
dmn_object = :ets.new(:dmn_object, [:private, read_concurrency: true])
I define them like this to be able to use the agents on different nodes when a need arises. But now the problem arises that, when I run a lot of agents at one node, that I could pass ERL_MAX_ETS_TABLES. Am I correct that this is a maximum per node and not per process for private ets tables? It will be a dummy question, but wouldn’t it be an enhancement in that case when this maximum would be per process in the case of private ets tables?

A few things:

ETS tables are local node only.
ETS if only accessed by a single process (like an Agent) is entirely useless, just have the agent keep the information itself in its state stack.
ETS table limit is per node, but configurable via command line parameters (if you reach 1400 though then you probably have a redesign issue first).
If you want multi-node pure data storage, you should look at the built-in mnesia (or a database).

1 Like

So no performance gains with ets lookups over searching through lists of maps within a process?
Why is the “issue” set to solved by overmind btw? I did not set this toggle.

Not really, if you want performance gains outside of a primary key and it is modelable with an index, then you still want mnesia or a database. :slight_smile:

ETS really is just like a map, a linear search over an utterly massive set of data might be faster in ETS but I doubt it is by much (and if you are reaching that, then a database should be used instead).

EDIT: Ah hah, I knew I remembered a thread where someone benchmarked between ETS and Maps and showed that ETS was about 1.93x slower than maps at a size of 100_000 elements (so quite huge, and that difference held pretty much true among all size of elements tested), not even a factor away in speed difference: Benchmarking lookup time for map vs ets

There are some situations where a private ETS might improve performance of a single process. If the process has a large active working set which changes frequently, ETS might work better than in-proc data, because ETS is an off-heap structure, so it doesn’t cause GC pressure. I’ve blogged about it here.

But I do agree that in the vast majority of the cases private ETS is likely an anti-pattern.

3 Likes

Ah, ok. But I do not like it that someone else than I has set this “solved” toggle to true with your first reply. It’s not in accordance with the name of a friendly forum. I think a good rule of conduct in fora would be: only the one who asked the question may set an answer to “problem solved”

Hmmm I’m not seeing where this is done, I see nothing marked as solved?

Sec, let me check the moderator log, nothing there showing that a moderator marked it themselves, but I do not have access to the admin log so unsure beyond that. You should be able to mark anything as solved if you want as far as I can see? Wonder if Discourse bug


Seconding @OvermindDL1, no answer in this thread has been marked as the answer. I liked Overmind’s first response, but that isn’t the same thing.

This discussion is not shown as being solved for me either.

I definitely saw it as “solved” by the reply from @OvermindDL1 this morning. It’s not that way now.

1 Like

I thought one of the differences with maps was the possiblity to add an index to the ets table and use that in the lookup, and that this would result in quicker lookups, at least for ‘non-trivial’ amounts of records. The test results in the link you provide are highly surprising. I don’t know if the results could have been influenced by an overflow to disk (exceeding some mem limit) of the ets table (no idea if there is such a mechanism)? Disappointing. So I have two dummy enhancement requests then. But that would be for erlang I suppose.

It definitely was marked. It has been changed.

Mnesia adds indexing on top of ETS. But if the table is private, it’s trivial to maintain the index yourself using a second ETS table, since you don’t have to worry about concurrent access.

1 Like

There is one case where private ETS tables are definitely the way to go and that is for large amounts of data. As ETS tables are not stored on the process heap they will not affect the garbage collection time for the process. Storing large amounts of data inside a process, say using a map, will lengthen the time for GC and increase the number of GCs which will affect the performance of the system. Worst case these longer times can become “noticeable”.

Another more subtle use is that it can make it easier to preserve data of a process when it crashes so it can be passed on to the restarted process.

5 Likes

Apart from the GC argument: shouldn’t the availability of the index make ets lookups faster (provided the non-trivial amounts of records) than extracting records from maps? And what about this enhancement idea?:

Isn’t that what I said in my post, which you responded to? If not, that’s precisely what I wanted to say :slight_smile:

ETS only has an index on the primary key, if you want more indexes then you have to keep them yourself or use MNesia.

However, maps are also keyed by the ‘primary key’, and since they are in-process they tend to be faster. :slight_smile:

Except ETS are global state optimized for distributed process access, there are a limit of certain physical resources in the system that make this hard to increase too much. If you are storing that much data in ETS then it really should not be in ETS, probably in GenServer’s instead or a database if large.

I know that.
I’m talking about a list of maps (each map is a record) and performing a lookup within that list versus an ets lookup. What do you mean with “maps are also keyed by the ‘primary key’” and what relation does that have with my question?

That is good then, and it indeed would build faster as each map overall is smaller than a single giant map. :slight_smile:

Access might be a tiny bit slower than a single giant map, but still faster than ETS I’d imagine.

I understand less and less of your answers. Maybe I should just stop with elixir and do something really stupid. :wink: