Performance impact of passing complex state around in map vs a lot of arguments

axelson · January 6, 2022, 7:06am

I wanted to highlight this as a good question! I think people know this as “received community wisdom”. I know that there’s one or two instances I’ve heard of people switching from structs to records for tight inner loops, but I’m having trouble finding an example of one! Actually I think Postgrex might be a good example, I think I’ve heard that Postgrex uses records for improved performance (instead of some other reason), but you should probably consider this heresy until you have some sort of confirmation. Here is one place where Postgrex uses records: postgrex/lib/postgrex/messages.ex at master · elixir-ecto/postgrex · GitHub

I do have a feeling that the core team would be willing to accept a PR that adds a section of performance to the Record docs.

BartOtten · January 6, 2022, 10:40am

I started working on a SAT solver and it greatly benefited from switching to records; for the mentioned reason. The project was abandoned, no public code.

Records are just pretty tuples. And as such, are crazy performant when you read them. However, the trade offs should be taken into account.

Less convenient than maps: less helper functions, less people familiar with them.
Full copy when updating a field. Depending on the size of the record, it may or may not be faster than a struct/map.

And that’s where the ‘tight loops’ come from: when you only read a few times, the performance increased is likely negligible compared to the trade off you introduce. But many reads = big gain = worth it (ps. depending on the application, as maxing performance in favor of productivity might not be required)

josevalim · January 6, 2022, 11:22am

I don’t think the goal is runtime performance per-se, but rather that the equivalent would be to define several modules+structs when none of them actually have to be public (and that would be compile-time expensive). So to me records work great when you need to define several data types where all of them are private.

dch · January 6, 2022, 11:40am

This hasn’t been mentioned yet, the OTP Efficiency Guide specifically on maps. There are a number of other docs here that are super useful, too.

5.5 How Maps are Implemented

Internally, maps have two distinct representations depending on the number of elements in the map. The representation changes when a map grows beyond 32 elements, or when it shrinks to 32 elements or less.

A map with at most 32 elements has a compact representation, making it suitable as an alternative to records.
A map with more than 32 elements is represented as a tree that can be efficiently searched and updated regardless of how many elements there are.

gut feel is that erlang records will get you a long way here as already mentioned. tuples / records / arrays are fast.

process dictionary

For max speed, consider the process dictionary, a private per-process key-value store. The massive upside is it’s very very fast, but the massive downside is that it’s not possible to debug it – the state is lost if your process crashes, not like a normal stack trace would be.

One way to mitigate this is to have a function wrapping the larger transformation (a bit like a transaction), so you can at least see the arguments that caused your process to crash, and then you can debug at your leisure offline. As long as your recursive function isn’t calling out to other functions, your whole state can live in the process dictionary, and you can keep the data shuffling in your wrapper function.

Finally, OTP has a pile of alternative data structures GB trees (and GB sets) or arrays which are implemented as partially nested tuples, and a number of set related higher level functions, if your PEG can be expressed that way, including digraph which is again “just” ets. Maybe they’re more useful, maybe not. But they’re often not consulted.

dch · January 6, 2022, 12:58pm

And TIL a thing about the process dictionary, from Alvaro Videla - when (if) you do process_info/1 not surprisingly, the PD needs to be copied into your system. Now, running process_info/1 is not so common in production, so I’d not worry too much about it, but it’s definitely worth bearing in mind.

There are at several other good posts on PD out there:

James Hague’s How I Learned to Stop Worrying and Love Erlang’s Process Dictionary
Fred Hebert’s On the Use of the Process Dictionary in Erlang
Mark Ericksen’s Abusing Elixir Processes and State!

zevv · January 10, 2022, 8:18pm

I spent some good time optimizing my code, and the fastest solution is - not surprising - just passing all individual parameters around to all the individual functions: no ets, no process dictionary, no maps - just functions and arguments. I learned another thing or two about optimizing elixir code, which was a fun ride.

I am pretty happy with the current performance: as a test I generate JSON parser from the official JSON grammar, and my generated parser runs only about 2.5 times slower than poison - well within my original goal of reaching at least 25% of poison’s performance!

Thank you all for your ideas and input!

dimitarvp · January 11, 2022, 2:13pm

A more detailed write-up will be hugely appreciated, if you have the time and energy for it one day.