Performance impact of passing complex state around in map vs a lot of arguments

zevv · December 30, 2021, 6:46pm

Hi, I’m looking for some generic advice on best-practices with regard to performance vs complexity.

I have a recursive function that gets called a few million times, it passes some complex state around in a map with a handful of members which typically get mutated during some passes of the function. After some measurements and experiments I found that a significant amount of the run time is spent in looking up and updating the map members.

I have now optimized some members the map access away by passing the individual members as function arguments - this considerably speeds up but significantly increases noise and complexity of my code, as there are a lot of places where I need to modify and pass all these arguments.

What is the idiomatic way to handle this kind of problem? Are there any other alternative ways to pass around a largish number of state parameters, other then putting them in a map? Ideally I would like some kind of fixed data structure which still allows acces to the members by name, but I’m not aware of anything like this for Elixir?

Thanks,

(The actual function is generated at compile time, but typically looks something like this: http://ix.io/3K67/elixir)

LostKobrakai · December 30, 2021, 7:30pm

You could look at records.

al2o3cr · December 30, 2021, 7:31pm

I can’t speak to the exact performance characteristics, but this sounds like Records

zevv · December 30, 2021, 9:13pm

Right, thank you @LostKobrakai and @al2o3cr, records seem to be the thing I was looking for. The notation is a bit cumbersome, but the end result is good: I’m gaining about 25% speed on my code.

dimitarvp · December 30, 2021, 9:45pm

What’s the rough ratio of reads vs. writes? You could always also use :ets for this.

zevv · December 30, 2021, 10:00pm

Thanks, but I don’t think :ets is suitable here: I really need raw performance, as this is the inner loop of a generated parser, with a state consisting of a number of stacks and pointers.

BartOtten · December 31, 2021, 12:07am

Records it is indeed. The syntax will quickly become familiar; as it’s just pattern matching.

I guess Records are not used that much as the need for raw speed isn’t always necessary, but when it is they do just fine.

christhekeele · December 31, 2021, 4:31am

:ets is definitely not the fit for really tight inner loops, as it has to copy data into the process every invocation. so probably slower than map lookups.

bartblast · December 31, 2021, 9:53am

I suggest to make a benchmark of :ets vs map passing. Once I had a case that I thought :ets wouldn’t fit well, but actually it sped up things x50. And it was a recursive parser passing a huge map. It’s always worth to confirm your assumptions with actual benchmarks.

bartblast · December 31, 2021, 9:57am

Another idea: is there any possibility of parallelization?

zevv · December 31, 2021, 12:32pm

Parallelization is not easy, as this is a parser generator that works on a linear input string of a possible recursive nature.

The grammar allows for an “ordered choice” operation though, where the parser will backtrack on the first option if it fails and continue on the next; it might be possible to do some kind of opportunistic parallelization here by parsing the different branches of the choice, but that would likely add a lot more complexity than I’m willing to put in for now.

aegatlin · January 4, 2022, 12:57am

From the docs on Records:

In Elixir, records are used mostly in two situations:

to work with short, internal data

to interface with Erlang records

I don’t see any mention elsewhere in the docs about how Records are more performant than maps in certain situations. How did people know to suggest this as performant alternative?

dimitarvp · January 4, 2022, 1:07am

Because “records” are just tuples, which are statically allocated low-level arrays with fixed size. You really can’t get any faster than making your CPU fetch data from e.g. address 0x0100 plus an offset (say element number 3), especially if the tuple / array is allocated on the stack and not on the heap.

Maps on the other hand are higher-level structures that do either a bit more complex lookups or are walking a singly-linked list (which is what Erlang’s / Elixir’s lists are), which has O(n) complexity. Hash map lookup theoretically has little smaller complexity and is thus a bit faster but it still can’t beat direct pointer access.

al2o3cr · January 4, 2022, 1:39am

OP stated that map updates & lookups were showing up as hotspots:

Records are close to structs ergonomically (versus bare tuples accessed by elem/2 or pattern-matching) and (as @dimitarvp points out) they involve simpler data structures that are likely to be fast-ish.

They aren’t guaranteed to be faster - imagine a corner-case like a struct with 10000 fields; copying the whole tuple to update a record could get expensive versus updating a map with structural sharing.

ambareesha7 · January 4, 2022, 2:08am

gist.github.com

https://gist.github.com/PJUllrich/4cd3d8e2e8c9170b560e5a501c13c9f3

big-o.md

# Big-O Time Complexities for Elixir data structures

## Map [[1]](https://hexdocs.pm/elixir/1.12/Map.html)

| Operation | Time Complexity |
| --------- | ---  |
| Access    | `O(log n)` |
| Search    | `O(log n)` |
| Insertion | `O(n)` for < 32 elements, `O(log n)` for >= 32 elements [[2]](https://stackoverflow.com/questions/38386314/why-elixirs-mapset-becomes-unordered-after-32-elements/38387520#38387520) |
| Deletion  | `O(n)` for < 32 elements, `O(log n)` for >= 32 elements |

This file has been truncated. show original

This may help you to understand time complexity for certain data structure

buro · January 4, 2022, 9:30am

Hello everybody,

here a post about Elixir optimization, the author, in one of many steps, quote and use :ets to improve performance. Hope it will be useful.

https://blog.jola.dev/elixir-string-processing-optimization

msz · January 4, 2022, 1:05pm

The comments so far are all about performance, and I think the complexity question is also worth addressing.

If you used a map of things and easily unpacked them into separate arguments of the function, then you just decreased the complexity of your code, not increased it. It meant that they really were separate arguments all along, and packing them into a map was an unnecessary layer of indirection, which increased complexity.

Yes, the code might now be looking uglier, but that’s only because you exposed a problem, not introduced it. It might be a bit more cumbersome working with it, and that’s good - it applies pressure to solve this problem properly by understanding and refining your arguments into a smaller set of better defined concepts.

LostKobrakai · January 4, 2022, 1:28pm

If that would be as universally true as you make it sound nobody on elixir should use structs, given they’re a fixed length/keyed container. We could just pass around their individual field values everywhere. I wouldn’t want to work in such a codebase.

msz · January 4, 2022, 2:34pm

I am actually suggesting using structs! Structs are the best way to represent well thought out concepts in Elixir and create modules around them.

What I’m suggesting against is packing several distinct things passed into a function into an ad-hoc map just to make passing them easier. From the fact that OP seemed to have unpacked his map into distinct arguments pretty easily, it sounded that was the case. Though I might be wrong of course!

dimitarvp · January 4, 2022, 3:37pm

Agreed! People put a ton of stuff in random maps and pretend it’s how it should be. 90% of the places I seen it (in Elixir codebases) is just somebody saying “feck it, I don’t want big function argument lists”.