Best performance for simple map (multicore etc.)

stiegi · May 16, 2023, 6:28am

Let’s say we have an elixir app, and its whole purpose is to get, hold and provide data in a simple format as this:

%{
    "1" => ["a", "b", "c"], 
    "2" => ["a", "b", "c"],
    ....
    "289237428" => ["x", "y", "z"]
}

So dynamic keys and lists of strings are added at runtime, basically like a very simple in-memory database.

Now I am wondering how I can implement an app like this as performant as possible. If I just create a simple map, it will run on one core only, but if that app is used by a lot of processes, this might be a bottleneck.

So I was thinking of creating a supervised app that creates processes dynamically, and instead of a key in the map I’d use the process id as the identifier, so each list would be its own process. Do you think that this is a valid solution? Are there other ways or maybe you know a library that solves this?

Thank you very much in advance for your input.

D4no0 · May 16, 2023, 6:31am

Take a look at ETS, it should fit you requirements perfectly.

stiegi · May 16, 2023, 9:04am

Thank you very much! One question though: if you have really large lists of strings, do you think it is better to use one table with a lot of keys, or every list of strings gets its own ETS table?

D4no0 · May 16, 2023, 9:07am

This depends on what you want to achieve, from the perspective of performance I think both approaches should have the same performance, the only reason I would split it in multiple ets tables would be in the case you want to partially load some data.

stevensonmt · May 16, 2023, 5:58pm

If you anticipate a large number of ETS tables it may be worth pointing out this note from the Erlang docs:

The number of tables stored at one Erlang node used to be limited. This is no longer the case (except by memory usage). The previous default limit was about 1400 tables and could be increased by setting the environment variable ERL_MAX_ETS_TABLES or the command line option +e before starting the Erlang runtime system. This hard limit has been removed, but it is currently useful to set the ERL_MAX_ETS_TABLES anyway. It should be set to an approximate of the maximum amount of tables used since an internal table for named tables is sized using this value. If large amounts of named tables are used and ERL_MAX_ETS_TABLES hasn’t been increased, the performance of named table lookup will degrade.

stiegi · May 17, 2023, 6:55am

Interesting. The question is what happens if you go beyond a limit. Does ets throw an error, is an old entry overwritten… Anyway, I think I will try using one table.

jhogberg · May 17, 2023, 9:27am

Nothing happens, it’s just that if the amount of named ETS tables exceeds that figure, the lookup of said tables by their names will degrade a pinch (the table operations themselves will have the same performance).

There will be no difference in performance for unnamed tables, and you can always mitigate the lookup cost by caching the result of ets:whereis/1.