Hi, I’m looking for some generic advice on best-practices with regard to performance vs complexity.
I have a recursive function that gets called a few million times, it passes some complex state around in a map with a handful of members which typically get mutated during some passes of the function. After some measurements and experiments I found that a significant amount of the run time is spent in looking up and updating the map members.
I have now optimized some members the map access away by passing the individual members as function arguments - this considerably speeds up but significantly increases noise and complexity of my code, as there are a lot of places where I need to modify and pass all these arguments.
What is the idiomatic way to handle this kind of problem? Are there any other alternative ways to pass around a largish number of state parameters, other then putting them in a map? Ideally I would like some kind of fixed data structure which still allows acces to the members by name, but I’m not aware of anything like this for Elixir?
Thanks,
(The actual function is generated at compile time, but typically looks something like this: http://ix.io/3K67/elixir)
Right, thank you @LostKobrakai and @al2o3cr, records seem to be the thing I was looking for. The notation is a bit cumbersome, but the end result is good: I’m gaining about 25% speed on my code.
Thanks, but I don’t think :ets is suitable here: I really need raw performance, as this is the inner loop of a generated parser, with a state consisting of a number of stacks and pointers.
:ets is definitely not the fit for really tight inner loops, as it has to copy data into the process every invocation. so probably slower than map lookups.
I suggest to make a benchmark of :ets vs map passing. Once I had a case that I thought :ets wouldn’t fit well, but actually it sped up things x50. And it was a recursive parser passing a huge map. It’s always worth to confirm your assumptions with actual benchmarks.
Parallelization is not easy, as this is a parser generator that works on a linear input string of a possible recursive nature.
The grammar allows for an “ordered choice” operation though, where the parser will backtrack on the first option if it fails and continue on the next; it might be possible to do some kind of opportunistic parallelization here by parsing the different branches of the choice, but that would likely add a lot more complexity than I’m willing to put in for now.
In Elixir, records are used mostly in two situations:
to work with short, internal data
to interface with Erlang records
I don’t see any mention elsewhere in the docs about how Records are more performant than maps in certain situations. How did people know to suggest this as performant alternative?
Because “records” are just tuples, which are statically allocated low-level arrays with fixed size. You really can’t get any faster than making your CPU fetch data from e.g. address 0x0100 plus an offset (say element number 3), especially if the tuple / array is allocated on the stack and not on the heap.
Maps on the other hand are higher-level structures that do either a bit more complex lookups or are walking a singly-linked list (which is what Erlang’s / Elixir’s lists are), which has O(n) complexity. Hash map lookup theoretically has little smaller complexity and is thus a bit faster but it still can’t beat direct pointer access.
OP stated that map updates & lookups were showing up as hotspots:
Records are close to structs ergonomically (versus bare tuples accessed by elem/2 or pattern-matching) and (as @dimitarvp points out) they involve simpler data structures that are likely to be fast-ish.
They aren’t guaranteed to be faster - imagine a corner-case like a struct with 10000 fields; copying the whole tuple to update a record could get expensive versus updating a map with structural sharing.
The comments so far are all about performance, and I think the complexity question is also worth addressing.
If you used a map of things and easily unpacked them into separate arguments of the function, then you just decreased the complexity of your code, not increased it. It meant that they really were separate arguments all along, and packing them into a map was an unnecessary layer of indirection, which increased complexity.
Yes, the code might now be looking uglier, but that’s only because you exposed a problem, not introduced it. It might be a bit more cumbersome working with it, and that’s good - it applies pressure to solve this problem properly by understanding and refining your arguments into a smaller set of better defined concepts.
If that would be as universally true as you make it sound nobody on elixir should use structs, given they’re a fixed length/keyed container. We could just pass around their individual field values everywhere. I wouldn’t want to work in such a codebase.
I am actually suggesting using structs! Structs are the best way to represent well thought out concepts in Elixir and create modules around them.
What I’m suggesting against is packing several distinct things passed into a function into an ad-hoc map just to make passing them easier. From the fact that OP seemed to have unpacked his map into distinct arguments pretty easily, it sounded that was the case. Though I might be wrong of course!
Agreed! People put a ton of stuff in random maps and pretend it’s how it should be. 90% of the places I seen it (in Elixir codebases) is just somebody saying “feck it, I don’t want big function argument lists”.