Enum.take/2 and Enum.drop/2 weird performance

seanmor5 · August 7, 2019, 9:37pm

I came across something weird today as I was doing some optimization.

I have a function that splits a list at some point. I was exploring the differences in using Enum.take/2 and Enum.drop/2. I ran some benchmarks on a list of 1,000,000 elements with 4 different scenarios:

Getting the first 500,000 elements using Enum.take(list, 500_000)
Getting the first 500,000 elements using Enum.drop(list, -500_000)
Getting the last 500,000 elements using Enum.take(list, -500_000)
Getting the last 500,000 elements using Enum.drop(list, 500_000)

Name                                          ips        average  deviation         median         99th %
Enum.drop/2 second 500,000 elements        142.99        6.99 ms     ±3.27%        6.97 ms        7.22 ms
Enum.take/2 first 500,000 elements          50.22       19.91 ms    ±33.95%       19.52 ms       26.40 ms
Enum.drop/2 first 500,000 elements          30.62       32.66 ms    ±24.34%       32.35 ms       77.36 ms
Enum.take/2 second 500,000 elements         23.11       43.28 ms    ±19.79%       42.81 ms      112.65 ms

Comparison: 
Enum.drop/2 get second 500,000 elements        142.99
Enum.take/2 get first 500,000 elements          50.22 - 2.85x slower +12.92 ms
Enum.drop/2 get first 500,000 elements          30.62 - 4.67x slower +25.66 ms
Enum.take/2 get second 500,000 elements         23.11 - 6.19x slower +36.28 ms

The results are KIND of as expected. Take is better at retrieving the first half of the list and drop is better at receiving the back half of the list. My real misunderstanding is WHY is drop/2 so much faster than take/2 at doing what it’s intended to do vs. what take is intended to do.

Also, shouldn’t the benchmarks be pretty similar no matter whether you’re dropping or taking forward or backward? Wouldn’t it make sense to just call drop/2 when the second input for take/2 is a negative integer and vice versa with drop/2?

kokolegorille · August 7, 2019, 10:04pm

Given that lists are linked lists, it makes sense that taking from start is faster than taking from middle. But if You want to optimize this, I would suggest Stream instam of Enum, because it implements lazy loading, while Enum is loading all the list in memory.

I would always add from start, never from the end, then I would reverse the list if needed.

It might not be useful if You just want to split the list…

seanmor5 · August 7, 2019, 10:13pm

For my purposes I’ve settled on using Enum.split/2 rather than either take/2 or drop/2. I was just confused as to why there’s such a discrepancy in the performance of these two functions.

What I mean is, why wouldn’t an implementation of drop and take when n is less than 0 look like:

def take(list, n) when n < 0 do
   drop(list, n)
end

def drop(list, n) when n < 0 do
   take(list, n)
end

Shouldn’t this theoretically close the performance gap?

kokolegorille · August 7, 2019, 10:15pm

I try to avoid this with lists (taking from the middle).

Ankhers · August 7, 2019, 10:30pm

This exact code would never return if you were to pass a negative n because it would continually match the n < 0 guard on both functions. If you meant

def take(list, n) when n < 0 do
  drop(list, n * -1)
end

def drop(list, n) when n < 0 do
  take(list, n * -1)
end

then I’m not sure that would give you what you are looking for. If I had some list of size 10 and wanted the last 2 elements, I would do Enum.take(list, -2) which would end up becoming drop(list, 2) which would return the first 8 elements of the list.

Unless I am misinterpreting what you meant?

seanmor5 · August 7, 2019, 10:34pm

Sorry I wasn’t thinking too hard about the actual implementation.

I think this:

def take(list, n) when n < 0 do
   drop(list, length(list)-n)
end

def drop(list, n) when n < 0 do
   take(list, length(list)-n)
end

Was what I meant (or something in the same spirit as this, I haven’t actually tested if this works).

OvermindDL1 · August 7, 2019, 10:47pm

To sum it up:

drop with a positive value: Just returns the tail at that position
drop with negative value: Has to rebuild the front list
take with a positive value: Has to rebuild the front list
take with a a negative value: Is currently rebuilding the back list. Now technically it shouldn’t necessarily need to if it were implemented differently (implement as Enum.drop in that case?), a PR could fix it. ^.^