Private ets tables and ERL_MAX_ETS_TABLES

StefanHoutzager · April 12, 2017, 3:05pm

I have processes (I almost dare not say, but they are agents, not genservers). In these agents I create two private ets tables:
bpmn_object = :ets.new(:bpmn_object, [:private, read_concurrency: true])
dmn_object = :ets.new(:dmn_object, [:private, read_concurrency: true])
I define them like this to be able to use the agents on different nodes when a need arises. But now the problem arises that, when I run a lot of agents at one node, that I could pass ERL_MAX_ETS_TABLES. Am I correct that this is a maximum per node and not per process for private ets tables? It will be a dummy question, but wouldn’t it be an enhancement in that case when this maximum would be per process in the case of private ets tables?

OvermindDL1 · April 12, 2017, 5:05pm

A few things:

ETS tables are local node only.
ETS if only accessed by a single process (like an Agent) is entirely useless, just have the agent keep the information itself in its state stack.
ETS table limit is per node, but configurable via command line parameters (if you reach 1400 though then you probably have a redesign issue first).
If you want multi-node pure data storage, you should look at the built-in mnesia (or a database).

StefanHoutzager · April 12, 2017, 5:43pm

So no performance gains with ets lookups over searching through lists of maps within a process?
Why is the “issue” set to solved by overmind btw? I did not set this toggle.

OvermindDL1 · April 12, 2017, 5:45pm

Not really, if you want performance gains outside of a primary key and it is modelable with an index, then you still want mnesia or a database.

ETS really is just like a map, a linear search over an utterly massive set of data might be faster in ETS but I doubt it is by much (and if you are reaching that, then a database should be used instead).

EDIT: Ah hah, I knew I remembered a thread where someone benchmarked between ETS and Maps and showed that ETS was about 1.93x slower than maps at a size of 100_000 elements (so quite huge, and that difference held pretty much true among all size of elements tested), not even a factor away in speed difference: Benchmarking lookup time for map vs ets

sasajuric · April 12, 2017, 6:42pm

There are some situations where a private ETS might improve performance of a single process. If the process has a large active working set which changes frequently, ETS might work better than in-proc data, because ETS is an off-heap structure, so it doesn’t cause GC pressure. I’ve blogged about it here.

But I do agree that in the vast majority of the cases private ETS is likely an anti-pattern.

StefanHoutzager · April 12, 2017, 6:46pm

Ah, ok. But I do not like it that someone else than I has set this “solved” toggle to true with your first reply. It’s not in accordance with the name of a friendly forum. I think a good rule of conduct in fora would be: only the one who asked the question may set an answer to “problem solved”

OvermindDL1 · April 12, 2017, 6:51pm

Hmmm I’m not seeing where this is done, I see nothing marked as solved?

Sec, let me check the moderator log, nothing there showing that a moderator marked it themselves, but I do not have access to the admin log so unsure beyond that. You should be able to mark anything as solved if you want as far as I can see? Wonder if Discourse bug…

benwilson512 · April 12, 2017, 8:06pm

Seconding @OvermindDL1, no answer in this thread has been marked as the answer. I liked Overmind’s first response, but that isn’t the same thing.

BrightEyesDavid · April 12, 2017, 10:43pm

This discussion is not shown as being solved for me either.

bbense · April 12, 2017, 11:27pm

I definitely saw it as “solved” by the reply from @OvermindDL1 this morning. It’s not that way now.

StefanHoutzager · April 13, 2017, 4:02am

I thought one of the differences with maps was the possiblity to add an index to the ets table and use that in the lookup, and that this would result in quicker lookups, at least for ‘non-trivial’ amounts of records. The test results in the link you provide are highly surprising. I don’t know if the results could have been influenced by an overflow to disk (exceeding some mem limit) of the ets table (no idea if there is such a mechanism)? Disappointing. So I have two dummy enhancement requests then. But that would be for erlang I suppose.

StefanHoutzager · April 13, 2017, 4:04am

It definitely was marked. It has been changed.

dom · April 13, 2017, 6:45am

Mnesia adds indexing on top of ETS. But if the table is private, it’s trivial to maintain the index yourself using a second ETS table, since you don’t have to worry about concurrent access.

rvirding · April 13, 2017, 1:07pm

There is one case where private ETS tables are definitely the way to go and that is for large amounts of data. As ETS tables are not stored on the process heap they will not affect the garbage collection time for the process. Storing large amounts of data inside a process, say using a map, will lengthen the time for GC and increase the number of GCs which will affect the performance of the system. Worst case these longer times can become “noticeable”.

Another more subtle use is that it can make it easier to preserve data of a process when it crashes so it can be passed on to the restarted process.

StefanHoutzager · April 13, 2017, 1:24pm

Apart from the GC argument: shouldn’t the availability of the index make ets lookups faster (provided the non-trivial amounts of records) than extracting records from maps? And what about this enhancement idea?:

sasajuric · April 13, 2017, 1:28pm

Isn’t that what I said in my post, which you responded to? If not, that’s precisely what I wanted to say

OvermindDL1 · April 13, 2017, 4:01pm

ETS only has an index on the primary key, if you want more indexes then you have to keep them yourself or use MNesia.

However, maps are also keyed by the ‘primary key’, and since they are in-process they tend to be faster.

Except ETS are global state optimized for distributed process access, there are a limit of certain physical resources in the system that make this hard to increase too much. If you are storing that much data in ETS then it really should not be in ETS, probably in GenServer’s instead or a database if large.

StefanHoutzager · April 13, 2017, 4:54pm

I know that.
I’m talking about a list of maps (each map is a record) and performing a lookup within that list versus an ets lookup. What do you mean with “maps are also keyed by the ‘primary key’” and what relation does that have with my question?

OvermindDL1 · April 13, 2017, 4:57pm

That is good then, and it indeed would build faster as each map overall is smaller than a single giant map.

Access might be a tiny bit slower than a single giant map, but still faster than ETS I’d imagine.

StefanHoutzager · April 13, 2017, 5:05pm

I understand less and less of your answers. Maybe I should just stop with elixir and do something really stupid.