Enum fusion?

Qqwy · December 13, 2020, 11:38pm

Hi all!

As you may know, it is very common in idiomatic Elixir code to work with transformations on datastructures, especially those implementing the Enumerable protocol. Usually we use the functions from Enum (and sometimes from Stream) to manipulate these, often in a pipeline of multiple steps.

Since the Enumerable protocol is an implementation of the Foldable concept from category theory, whose main fundamental operation is reduce(in other languages also known as ‘fold’), which always outputs a list, we end up using lists virtually everywhere.

This means that in an Enum pipeline, a lot of intermediate lists are being generated. I seem to remember José explaining on the mailinglist back in the day(note: I was unable to find it; if you know where this was mentioned, let me know and I’ll link to it!) that Enum's functions were intentionally not implemented as macros to make it easier to follow stack traces when something broke.

I think this is definitely the right choice, especially since a lot of Elixir code is written with “IO-bound” operations in mind, in which sheer computing speed is less important.

However, it did start making me wonder: What about creating a FastEnum drop-in replacement, where map, reduce, etc. would be implemented as macros that would fuse consecutive operations together to improve performance?
In many cases, a pipeline of Enum-functions could be transformed into a single for-comprehension. Besides the added benefit of fusing consecutive calls, for is also extremely well optimized by the BEAM.

Now, why did I start this topic? I essentially have two questions/topics for discussion:

Do you think a library like this would be worthwhile?
Do you happen to know whether someone already performed any exploratory work in this direction? (The ideas presented here are, after all, far from novel.)

ityonemo · December 14, 2020, 12:32am

Are you encountering a speed bottleneck somewhere?

lpil · December 14, 2020, 12:39am

What a cool idea! I would love to see how far this idea could go

Qqwy · December 14, 2020, 12:51am

No. This is an ‘how far could we take it and would it be worth it’ rather than a ‘my code is too slow’ kind of situation.

ityonemo · December 14, 2020, 1:12am

My suggestion is write it and measure. My instinct is that when you find a domain where there’s a performance improvement over Enum, (let’s say, 10% better than enum) you are in a regime where the Stream penalty is worth it, and you’re say… only 2% better than Stream.

Ultimately, you will be hamstrung by speed of the erlang datatypes. If you need better, you should write a nif or something. But really I don’t understand the obsession with performance. For most domains where you’re using Elixir, other things, like network or database, dominate your performance concerns.

mindok · December 14, 2020, 1:20am

I think you are right - most Elixir apps will have bottlenecks elsewhere (as OP pointed out), but it is very healthy, I think, to try and squeeze more out of a platform where possible. Each little tweak contributes to keeping hosting costs down, reducing energy consumption and improving user experience. It’s also fun!

hauleth · December 14, 2020, 1:26am

You mean Stream?

Qqwy · December 14, 2020, 1:52am

No. Stream is lazy.

What I mean is that something like

mycollection
|> Enum.map(&foo/1)
|> Enum.map(&bar/1)
|> Enum.map(&baz/1)
|> Enum.reduce(starting_accumulator, &qux/2)

can be turned into

for x <- mycollection, reduce: starting_accumulator do
  acc -> 
    x2 = x |> foo() |> bar() |> baz()
    qux(x2, acc)
end

(and this can be expanded further to e.g. also work with filter, flat_map, into, etc.)

hauleth · December 14, 2020, 3:32am

Which is lazy evaluation with forced computation at the end of pipeline:

mycollection
|> Stream.map(&foo/1)
|> Stream.map(&bar/1)
|> Stream.map(&baz/1)
|> Enum.reduce(starting_accumulator, &qux/2)

Does (almost) exactly what you want.

NobbZ · December 14, 2020, 9:21am

That’s not quite the same.

The benefit of proper fusion is to get the memory characteristics of the Stream (which avoids building and intermediate lists) while retaining the speed of a regular Enum.

In combination a fused Enum is therefore faster in theory as it avoids allocations and takes stress from the GC.

Will-W · December 17, 2020, 6:32pm

Performance is not a topic that be completely ignored. Elixir is not the language to choose for a number crunching application, but sometimes you need to do some number crunching as part of a larger application and the benefits of not having to go down the NIF or Port route are enormous. There’s a reason the VM team are implementing a JIT for the next version.

In the Elixir project that I have just finished (telecoms hardware), the vast majority of the code is not performance sensitive at all (and sometimes latency sensitive like a Web app is). There is one (key) bit of functionality that allocates time slots on the link. It’s a difficult algorithmic problem (basically the “knapsack problem” with some extra constraints). The implementation in Elixir is a little slow (3 to 30 seconds per link). We did a bit of optimisation and we were able to simplify the problem a bit to ensure that that it ran within the available time. The simplification might not have been possible and we would have needed to optimise further and would have been happy to be non-idiomatic. Something like this library might have helped. I would be curious to see how much speed be fit this approach would actually provide.

josevalim · December 17, 2020, 8:52pm

My concern about implementing this as macros is because it only works as long you don’t break the calls apart. If you refactor the code to use functions, then macros won’t be able to see across that, and that seems brittle.

Therefore, a library would need to impose a guideline to keep as many functions calls together as possible, which is precisely what you would get with for. So part of me wants to say that for is precisely the fusion that you want, and there is even an issue to add :limit to it.

Qqwy · December 17, 2020, 10:46pm

This is definitely true. In essence, you are giving the users the choice between:

Keeping the pipeline long;
splitting the code into multiple macros rather than functions.

Both have obvious drawbacks.

But maybe there still are situations in which it is useful. After all, a pipeline of enumeration functions is usually more readable than a couple of hand-crafted for-loops.

I did some preliminary benchmarks which seemed to indicate that for a simple chain of multiple map operations with a reduce at the end, a single for is roughly ~20% faster. This seem promising, but more bench-marking is necessary to find out whether this still holds in slightly different situations.

sasajuric · December 18, 2020, 9:54am

I think this is a very interesting idea which is worth giving a shot. In the worst case scenario you’ll bump into some hard limitations, but even then something good can come of it, for example a blog post which documents the experiment, explains why it failed, and presents the existing options (like for and plain recursion).

I can see this being useful when I want to quickly see what would be the effect of fusing a couple of Enum operations. Presently this requires a more complex rewrite, while FastEnum could turn this into a simple search & replace. I actually had the need for something like this on a couple of occasions (I think it was mostly AoC challenges). I’m not sure I’d use FastEnum in the final version though. Most probably I’d turn the hot pipeline into a plain recursion, which should be the fastest option.

sabiwara · July 31, 2021, 8:33am

Hi all!

I remember reading this thread a while ago, but I had no idea how one would implement such a macro back then. The idea resurfaced lately as I was finding myself rewriting some bottleneck Enum code as specialized recursive functions to speed it up. Out of curiosity, I started to play with the idea of a defenum macro to automate the process.

The result looks quite promising in terms of performance / flexibility, although this is still highly experimental dark magic at this stage
I published the POC to get some feedback, let me know your thoughts!

https://hexdocs.pm/enumancer/Enumancer.html

Qqwy · September 4, 2021, 4:09pm

@sabiwara Very cool to see that this idea has been picked up! Enumancer looks great!
If there’s any part w.r.t. for instance writing the macros that you struggle with, let me know.

sabiwara · September 4, 2021, 5:59pm

@Qqwy Thanks, this means a lot!

There hasn’t been any major blocker at the minute, I simply have been focusing on other projects lately. But I might definitely ask for some help in the future!

bmitc · December 2, 2022, 7:51pm

I came across this thread because I have written a lot of stuff like consecutive Enum.map because it is usually cleaner. For example, I just did this in Advent of Code 2022:

round_strategies()
|> Enum.map(&convert_strategy_to_round/1)
|> Enum.map(&score_round/1)

A question I have is why does this need to be handled via a library, macros, or even syntax? Isn’t this really something the compiler could or should be doing as an optimization/fusion pass?

Why can’t the compiler see the above and fuse it to a single Enum.map by sequencing the mapper functions? (Disclaimer: I do not know how the Elixir compiler works, but it was my understanding that this is how this stuff can be done.)

tcoopman · December 2, 2022, 9:00pm

Well, one reason would be that the code is not the same. If you have side effects in your mapping functions, the code wouldn’t behave the same.

I think it’s a bad idea to write side effects in mapping functions but still it’s possible in elixir and as far as I know it’s not possible for the compiler to detect these.

cmo · December 2, 2022, 10:33pm

You can always optimise for performance over aesthetics.

round_strategies()
|> Enum.map(fn strategy -> 
  strategy
  |> convert_strategy_to_round()
  |> score_round()
end)

for strategy <- round_strategies do
  strategy
  |> convert_strategy_to_round()
  |> score_round()
end