Heya.
I am inviting you to copy-paste the module below and add your own functions and measure them, or simply give other ideas about how the desired result can be produced best.
I got interested in writing a function that takes two lists where the first is always bigger than the second and make pairs a la Enum.zip
but not stop when the smaller list depletes. I want it to continue, while cycling through the second list, until we have a list with the same size as the first one, containing 2-size tuples.
Example:
list0 = [1, 2, 3, 4, 5]
list1 = [:a, :b]
I want the output to be:
[{1, :a}, {2, :b}, {3, :a}, {4, :b}, {5, :a}]
I came up with these two functions (and benchmark code):
defmodule Xyz do
def pair_with_enum_reduce(list0, list1) do
list1_tuple = List.to_tuple(list1)
list1_last_position = tuple_size(list1_tuple) - 1
Enum.reduce(list0, {[], 0}, fn item, {final_list, position} ->
mapped_item = {item, elem(list1_tuple, position)}
position =
case position do
^list1_last_position ->
0
_ ->
position + 1
end
{[mapped_item | final_list], position}
end)
|> elem(0)
|> Enum.reverse()
end
def pair_with_stream_cycle(list0, list1) do
stream = Stream.cycle(list1)
Enum.zip(list0, Enum.take(stream, length(list0)))
end
def bench() do
list0 = Enum.to_list(1..2000)
list1 = [:worker_0, :worker_1, :worker_2, :worker_3, :worker_4]
Benchee.run(%{
"pair_with_enum_reduce" => fn -> pair_with_enum_reduce(list0, list1) end,
"pair_with_stream_cycle" => fn -> pair_with_stream_cycle(list0, list1) end
})
end
end
Benchmark results on my machine:
Name ips average deviation median 99th %
pair_with_enum_reduce 19.80 K 50.49 μs ±28.32% 50 μs 85 μs
pair_with_stream_cycle 7.64 K 130.89 μs ±40.15% 108 μs 327.12 μs
Comparison:
pair_with_enum_reduce 19.80 K
pair_with_stream_cycle 7.64 K - 2.59x slower +80.39 μs
I know Stream
incurs some performance penalty but was rather surprised by how much. I don’t like the size of the pair_with_enum_reduce
function, nor the fact that it has to call Enum.reverse
at the end but it’s still significantly faster. And I don’t like that the pair_with_stream_cycle
relies on calling length
on the first list. Both functions I am kind of unhappy with.
Any criticisms? And, do you think you could do better?
(Alternatively, another implementation might not care about which size list is bigger.)