Why should we iterate a list twice to pipe 2 aggregate functions?

ahmadferdous · November 24, 2020, 4:37pm

I have structs Company, Employee and a list of struct values employees. Struct Employee contains 2 fields - hours and bill.

I need to know total hours and total bill. Should I iterate once (for loop?!) to calculate both together or should we do the following? One of my colleagues say the following is Elixir-y. Why is iterating multiple times encouraged when it is inefficient?

I’m sure there is a good reason and I want to know. Please help me.

def calculate(company) do
    company
    |> calculate_total_hours()
    |> calculate_total_bill()
end

def calculate_total_hours(company) do
  total = company.employees
  |> Enum.map(& &1.hours)
  |> Enum.sum()

  %{company | total_hours: total}
end

def calculate_total_bill(company) do
  total = company.employees
  |> Enum.map(& &1.bill)
  |> Enum.sum()

  %{company | total_bill: total}
end

EDIT: Updated code

al2o3cr · November 24, 2020, 4:47pm

Because it’s easier to read operations that are cleanly decomposed into smaller parts, and because “inefficient” is relative. Traversing a list is a very cheap operation on the BEAM (it’s almost a Lisp under the hood).

It’s hard to say more about the specific example since it doesn’t work: calculate_total_hours returns a number but is piped into a function that expects a list, and it’s unclear where the total hours and total bill should be stored in company.

aenglisc · November 24, 2020, 5:28pm

“Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!”

gregvaughn · November 24, 2020, 6:29pm

I would likely write this as a single reduce, and I don’t think it suffers elixir-y-ness

{total_hours, total_bill} =
  Enum.reduce(company.employees, {0, 0}, fn %{hours: h, bill: b}, {h_acc, b_acc} ->
    {h_acc + h, b_acc + b}
  end)

ahmadferdous · November 24, 2020, 7:22pm

Thanks. I made up the example and the code was not correct. Sorry about that. Updated.

ahmadferdous · November 24, 2020, 7:24pm

I made up the example to capture the problem. Sorry that it was not a working code. I updated code.
The question remains the same - why iterate twice when I can achieve the same goal by iterating just once? In future, what should I answer when somebody coming from OOP background asks me?

jswny · November 24, 2020, 7:54pm

Why not just use streams? They are lazy so as long as you only call Stream functions, you can chain many operations together and only evaluate them at the end for a single iteration. You can keep the multiple functions calls so you don’t have to mash them into one, and you can still iterate only once. Take a look at the paragraphs at the top of the Stream docs for a very similar example: https://hexdocs.pm/elixir/Stream.html

ahmadferdous · November 24, 2020, 8:01pm

@jswny , each operation (calculate_total_hours and calculate_total_bill) needs to be evaluated. So, these 2 operations do not seem to be streamable.

kokolegorille · November 24, 2020, 8:51pm

I am curious how You would make it in one go…

If You have those 2 ways to solve this problem, use Benchee to check the speed of both. You might be surprised by the result

UPDATE: I fixed the repo path, I was not on the right tab

tomkonidas · November 24, 2020, 8:56pm

You linked the wrong repo

ahmadferdous · November 24, 2020, 8:59pm

@kokolegorille , see @gregvaughn’s answer above that uses a single reduce function to calculate both in one go.

kokolegorille · November 24, 2020, 9:01pm

I saw once an Enum.reduce going slower than Enum.map, followed by Enum.into…

lud · November 25, 2020, 9:49am

If you want only calculate_total_hours you can call that single function. It may not bee a good idea to bind two independent pieces of logic together, unless you are 100% sure that you will always need to call the two computations. And in that case I would just follow above advice: make it beautiful then move on.

al2o3cr · November 25, 2020, 3:16pm

It would take very careful benchmarking to clearly show whether the single reduce is faster or slower than the two passes over the list. My back-of-the-envelope calculation:

Split loops, per element in the list:

two “get rest of list”
two map accesses (to get bill or hours)
two additions

Fused loops, per element in the list:

two map accesses (maybe? Not 100% sure what that pattern-match compiles to)
one “get rest of list”
two additions
one “create new tuple”

Both have two map accesses and two additions per element in the list, as well as a single “get rest of list”; so the budget tradeoff comes down to an additional “get rest of list” versus a “create new tuple” operation.

The BEAM’s optimizer is going to complicate things even further; it might be able to elide most of the cost of “create new tuple” by reusing the tuple passed as an argument - but if the split-loop functions are called more often they might get JITed more aggressively.