Learning idiomatic Elixir - Q1

stefanluptak · September 6, 2023, 8:21am

Benchmarking is quite simple. You download the gist file I posted above and then run elixir benchmark.exs. That’s it.

Name                    ips        average  deviation         median         99th %
100phlecs            2.88 M      347.02 ns  ±8200.44%         250 ns         417 ns
wanton7              2.47 M      405.02 ns  ±4260.02%         334 ns         500 ns
nikiiv2              1.57 M      635.55 ns  ±4499.36%         458 ns         875 ns
tomekowal            1.31 M      762.29 ns  ±2640.36%         667 ns         917 ns
nikiiv               1.29 M      777.10 ns  ±3606.33%         625 ns         917 ns
eksperimental        0.22 M     4460.79 ns   ±200.29%        4083 ns    11791.46 ns
stefanluptak       0.0737 M    13571.83 ns    ±26.25%       12917 ns       26416 ns

Comparison:
100phlecs            2.88 M
wanton7              2.47 M - 1.17x slower +58.00 ns
nikiiv2              1.57 M - 1.83x slower +288.53 ns
tomekowal            1.31 M - 2.20x slower +415.26 ns
nikiiv               1.29 M - 2.24x slower +430.07 ns
eksperimental        0.22 M - 12.85x slower +4113.77 ns
stefanluptak       0.0737 M - 39.11x slower +13224.80 ns

Memory usage statistics:

Name             Memory usage
100phlecs             2.02 KB
wanton7               0.56 KB - 0.28x memory usage -1.46094 KB
nikiiv2               0.79 KB - 0.39x memory usage -1.23438 KB
tomekowal             6.64 KB - 3.28x memory usage +4.62 KB
nikiiv                6.66 KB - 3.29x memory usage +4.63 KB
eksperimental        28.74 KB - 14.20x memory usage +26.72 KB
stefanluptak         10.99 KB - 5.43x memory usage +8.97 KB

Qqwy · September 6, 2023, 9:14am

Thank you @eksperimental for tagging me. What an interesting topic!

Yes and no. The Erlang :array module indeed does not state a particular time complexity and it is allowed to change its internal implementation whenever a new version of Erlang would want.

But practically speaking, for the last 15+ years, the implementation has been the same: finger trees with node size 10 (in other words: nested 10-element-tuples), meaning that access is O(log₁₀(n)). Logarithmic time complexities (especially those with large bases) can be considered ‘practically constant-time’.

Besides my Arrays library there’s also Okasaki which is very similar but for queues. It probably will have a constant-time overhead over using :queue directly because of the extra layer of wrapping with Elixir structs and protocols, but using it should be much more idiomatic than using :queue or :array directly.

And then there are of course a hand full of libraries written using NIFs (natively implemented functions), such as Explorer and Nx which are faster for certain math or data science operations (when you want run the same calculation on a large block of data) but they are not general-purpose (you cannot store any Elixir structure in them; most operations only work on ints or floats).

I might try to write an implementation of this example problem myself if I find the time .

wanton7 · September 6, 2023, 12:18pm

Used bit more of my time to avoid more memory allocations and here is v2. Now uses 0KB

defmodule CodingTest do
  def solution([h | t] = arr, target_sum) do
    find_solution(1, arr, t, h, target_sum)
  end

  defp find_solution(_, _, _, target_sum, target_sum) when target_sum > 0, do: :OK
  defp find_solution(_, _, [0 | _], _, 0), do: :OK

  defp find_solution(size, arr, [h | t], current_sum, target_sum) when current_sum < target_sum do
    current_sum = current_sum + h
    find_solution(size + 1, arr, t, current_sum, target_sum)
  end

  defp find_solution(size, [h | t], right, current_sum, target_sum) when current_sum > target_sum do
    current_sum = current_sum - h
    find_solution(size - 1, t, right, current_sum, target_sum)
  end

  defp find_solution(0, _, [h | t] = right, _, target_sum), do: find_solution(1, right, t, h, target_sum)

  defp find_solution(_, _, [], current_sum, target_sum) when current_sum != target_sum, do: :KO
  defp find_solution(0, _, [], _, _), do: :KO
end

Qqwy · September 6, 2023, 12:57pm

I had some spare time, so here are three implementations.

I have created an updated gist for the benchmark which you can run just as before as elixir benchmark.exs.

I’ve also added a few larger inputs, so you can see the difference in how solutions behave if they get a list with 100 elements vs a list with 1_000_000 elements.

1. Using Okasaki

The following is a 1:1 conversion of @100phlecs ’ solution but using the afore-mentioned Okasaki library for the queue rather than using Erlang’s :queue module directly. I’ve added the three different backing queue implementations as separate functions to the benchmark so you can see what the difference in performance and memory usage is (for this particular problem).

The main takeaway (at least for me) is that in these kinds of situations you can see the overhead of an extra Elixir struct wrapping the thing you are manipulating, and the overhead of dispatching through a protocol (even though it is consolidated).
I expect that these overheads are especially visible because there are many inter-module calls which are not JIT-friendly. Nonetheless, in everything but the most performance-critical code I think that using a library that has Elixir-friendly features (e.g. pipe-able functions, data structures implementing Inspect, Enumerable, Collectable, Access) is preferable to using :queue or :array directly (or rolling your own).

defmodule QqwyOkasaki do
  @moduledoc """
  A straight-up 1:1 port of 100phlecs' solution but using queues from Okasaki library
  instead of Erlang's `:queue` module.

  Unfortunately, benchmarking shows that there is (a constant time) overhead,
  probably because of the extra Elixir struct wrapping the queue, and dispatching happening through protocols.
  """

  def solution([num | rst], target, queue_implementation \\ Okasaki.Implementations.ConstantQueue) do
    q = Okasaki.Queue.empty(implementation: queue_implementation)
    find(rst, Okasaki.Queue.insert(q, num), num, target)
  end

  defp find(lst, q, sum, target) when sum == target do
    case {lst, Okasaki.Queue.empty?(q)} do
      {[], true} ->
        nil

      {[num | rst], true} ->
        find(rst, Okasaki.Queue.insert(q, num), num, target)

      {_, false} ->
        :ok
    end
  end

  defp find([num | rest], q, sum, target) when sum < target,
    do: find(rest, Okasaki.Queue.insert(q, num), sum + num, target)

  defp find(lst, q, sum, target) when sum > target do
    {:ok, {prev, q}} = Okasaki.Queue.remove(q)
    find(lst, q, sum - prev, target)
  end

  defp find([], _, _, _), do: nil
end

2. Using Nx

This was mostly written for fun because I haven’t had the opportunity to use Nx much yet.

It is not actually fast. I think this is because of two reasons:

The overhead of JIT is visible if your input collections are small.
The problem is not SIMD-friendly nor easily paralellizable, so loop unrolling/branch predicting etc. will not help.

defmodule QqwyNx do
  import Nx.Defn

  @moduledoc """
  A fun implementation using Nx tensors.

  Not actually fast, since the problem is not SIMD-friendly and JIT-compilation
  has a significant overhead when used on small input collections.
  """

  def valid?(list, target) do
    found =
      do_valid?(target, Nx.tensor([0 | list], backend: EXLA.Backend))
      |> Nx.to_number

    found == 1
  end

  defn do_valid?(target, tensor) do
    sums = Nx.cumulative_sum(tensor)
    {_, _, _, _, found} =
      while {target, sums, lower_index = 0, higher_index = 1, found = 0},
        (not found) and higher_index < Nx.flat_size(sums)
        do
        h = sums[higher_index]
        l = sums[lower_index]
        cond do
          h - l == target and lower_index < higher_index ->
            # Answer found: short-circuit
            found = 1
            {target, sums, lower_index, higher_index, found}

          h - l <= target ->
            # Below target: add next element to window
            higher_index = higher_index + 1
            {target, sums, lower_index, higher_index, found}

          h - l >= target ->
            # Above target: remove oldest element from window
            lower_index = lower_index + 1
            {target, sums, lower_index, higher_index, found}
          true ->
            # Unreachable but Nx compiler requires it
            {target, sums, lower_index, higher_index, found}
        end
      end
    found
  end
end

3. Using lists properly

After tinkering a little, I realized that there is a much simpler solution! We only need to walk over the input list once, and can build up the window implicitly while doing this.

The trick is that we pass the same list twice to the recursive function. Inside the recursion, one of the two parameters will become the ‘tail’ of the list used by the other argument. Because of how immutable, persistent datastructures work, the same underlying memory is used without any copying so this is fast and memory-efficient.
And because we are only looking at the head of the list, access is O(1).

This is my preferred solution, and I think it is reasonably readable.

EDIT: The new solution by @wanton7 implements the same idea. I believe both of our solutions compile down to virtually identical bytecode.

defmodule Qqwy do
  def valid?(list, target) do
    do_valid?(target, list, list, 0, 0)
  end

  defp do_valid?(target, _, _, current, length) when current == target and length > 0, do: true
  defp do_valid?(_, [], _, _, _), do: false
  defp do_valid?(_, _, [], _, _), do: false
  defp do_valid?(target, [h | hs], [l | ls], current, length) do
    cond do
      current <= target ->
        do_valid?(target, hs, [l | ls], current + h, length + 1)
      current >= target ->
        do_valid?(target, [h | hs], ls, current - l, length - 1)
    end
  end
end

Benchmarking results:

➜   elixir benchmarks.exs
Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.15.0
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 500 ms
time: 1 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: 100, 10_000, 1_000_000
Estimated total run time: 2.27 min

Benchmarking 100phlecs with input 100 ...
Benchmarking 100phlecs with input 10_000 ...
Benchmarking 100phlecs with input 1_000_000 ...
Benchmarking eksperimental with input 100 ...
Benchmarking eksperimental with input 10_000 ...
Benchmarking eksperimental with input 1_000_000 ...
Benchmarking nikiiv with input 100 ...
Benchmarking nikiiv with input 10_000 ...
Benchmarking nikiiv with input 1_000_000 ...
Benchmarking nikiiv2 with input 100 ...
Benchmarking nikiiv2 with input 10_000 ...
Benchmarking nikiiv2 with input 1_000_000 ...
Benchmarking qqwy with input 100 ...
Benchmarking qqwy with input 10_000 ...
Benchmarking qqwy with input 1_000_000 ...
Benchmarking qqwy_nx with input 100 ...

14:33:07.171 [info] TfrtCpuClient created.
Benchmarking qqwy_nx with input 10_000 ...
Benchmarking qqwy_nx with input 1_000_000 ...
Benchmarking qqwy_okasaki_amortized with input 100 ...
Benchmarking qqwy_okasaki_amortized with input 10_000 ...
Benchmarking qqwy_okasaki_amortized with input 1_000_000 ...
Benchmarking qqwy_okasaki_constant with input 100 ...
Benchmarking qqwy_okasaki_constant with input 10_000 ...
Benchmarking qqwy_okasaki_constant with input 1_000_000 ...
Benchmarking qqwy_okasaki_erlang with input 100 ...
Benchmarking qqwy_okasaki_erlang with input 10_000 ...
Benchmarking qqwy_okasaki_erlang with input 1_000_000 ...
Benchmarking stefanluptak with input 100 ...
Benchmarking stefanluptak with input 10_000 ...
Benchmarking stefanluptak with input 1_000_000 ...
Benchmarking tomekowal with input 100 ...
Benchmarking tomekowal with input 10_000 ...
Benchmarking tomekowal with input 1_000_000 ...
Benchmarking wanton7 with input 100 ...
Benchmarking wanton7 with input 10_000 ...
Benchmarking wanton7 with input 1_000_000 ...
Benchmarking wanton72 with input 100 ...
Benchmarking wanton72 with input 10_000 ...
Benchmarking wanton72 with input 1_000_000 ...

##### With input 100 #####
Name                             ips        average  deviation         median         99th %
qqwy                       4272.33 K        0.23 μs    ±52.28%        0.25 μs        0.38 μs
wanton72                   4209.73 K        0.24 μs    ±39.10%        0.25 μs        0.38 μs
100phlecs                  1413.34 K        0.71 μs  ±1352.57%        0.58 μs        0.92 μs
wanton7                     756.03 K        1.32 μs   ±897.87%        1.21 μs        1.46 μs
nikiiv                      750.64 K        1.33 μs   ±563.77%        1.25 μs        1.58 μs
tomekowal                   734.15 K        1.36 μs   ±497.97%        1.25 μs        1.54 μs
qqwy_okasaki_erlang         345.05 K        2.90 μs   ±352.25%        2.67 μs        4.33 μs
qqwy_okasaki_amortized      336.25 K        2.97 μs   ±250.51%        2.88 μs        3.21 μs
nikiiv2                     295.09 K        3.39 μs    ±67.50%        3.29 μs        3.83 μs
qqwy_okasaki_constant       270.46 K        3.70 μs   ±319.20%        3.42 μs        4.92 μs
eksperimental               113.38 K        8.82 μs   ±154.92%        8.17 μs          40 μs
stefanluptak                  1.48 K      677.72 μs    ±16.20%      667.33 μs      818.48 μs
qqwy_nx                    0.0477 K    20956.23 μs     ±5.44%    20558.48 μs    25253.79 μs

Comparison:
qqwy                       4272.33 K
wanton72                   4209.73 K - 1.01x slower +0.00348 μs
100phlecs                  1413.34 K - 3.02x slower +0.47 μs
wanton7                     756.03 K - 5.65x slower +1.09 μs
nikiiv                      750.64 K - 5.69x slower +1.10 μs
tomekowal                   734.15 K - 5.82x slower +1.13 μs
qqwy_okasaki_erlang         345.05 K - 12.38x slower +2.66 μs
qqwy_okasaki_amortized      336.25 K - 12.71x slower +2.74 μs
nikiiv2                     295.09 K - 14.48x slower +3.15 μs
qqwy_okasaki_constant       270.46 K - 15.80x slower +3.46 μs
eksperimental               113.38 K - 37.68x slower +8.59 μs
stefanluptak                  1.48 K - 2895.44x slower +677.49 μs
qqwy_nx                    0.0477 K - 89531.96x slower +20956.00 μs

Memory usage statistics:

Name                           average  deviation         median         99th %
qqwy                              0 KB     ±0.00%           0 KB           0 KB
wanton72                          0 KB     ±0.00%           0 KB           0 KB
100phlecs                      5.20 KB     ±0.00%        5.20 KB        5.20 KB
wanton7                        1.16 KB     ±0.00%        1.16 KB        1.16 KB
nikiiv                        13.52 KB     ±0.00%       13.52 KB       13.52 KB
tomekowal                     13.52 KB     ±0.00%       13.52 KB       13.52 KB
qqwy_okasaki_erlang            9.34 KB     ±0.00%        9.34 KB        9.34 KB
qqwy_okasaki_amortized         9.62 KB     ±0.00%        9.62 KB        9.62 KB
nikiiv2                           0 KB     ±0.00%           0 KB           0 KB
qqwy_okasaki_constant         11.13 KB     ±0.00%       11.13 KB       11.13 KB
eksperimental                 57.15 KB     ±0.00%       57.15 KB       57.15 KB
stefanluptak                 387.20 KB     ±0.00%      387.20 KB      387.20 KB
qqwy_nx                   12714.51 KB     ±0.02%    12715.55 KB    12719.06 KB

Comparison:
qqwy                              0 KB
wanton72                          0 KB - 1.00x memory usage +0 KB
100phlecs                      5.20 KB - ∞ x memory usage +5.20 KB
wanton7                        1.16 KB - ∞ x memory usage +1.16 KB
nikiiv                        13.52 KB - ∞ x memory usage +13.52 KB
tomekowal                     13.52 KB - ∞ x memory usage +13.52 KB
qqwy_okasaki_erlang            9.34 KB - ∞ x memory usage +9.34 KB
qqwy_okasaki_amortized         9.62 KB - ∞ x memory usage +9.62 KB
nikiiv2                           0 KB - 1.00x memory usage +0 KB
qqwy_okasaki_constant         11.13 KB - ∞ x memory usage +11.13 KB
eksperimental                 57.15 KB - ∞ x memory usage +57.15 KB
stefanluptak                 387.20 KB - ∞ x memory usage +387.20 KB
qqwy_nx                   12714.51 KB - ∞ x memory usage +12714.51 KB

Reduction count statistics:

Name                           average  deviation         median         99th %
qqwy                                77     ±0.00%             77             77
wanton72                            76     ±0.00%             76             76
100phlecs                          475     ±0.00%            475            475
wanton7                            896     ±0.00%            896            896
nikiiv                             324     ±0.00%            324            324
tomekowal                          325     ±0.00%            325            325
qqwy_okasaki_erlang               1414     ±0.00%           1414           1414
qqwy_okasaki_amortized            1173     ±0.00%           1173           1173
nikiiv2                            439     ±0.00%            439            439
qqwy_okasaki_constant             1139     ±0.00%           1139           1139
eksperimental                     5109     ±0.00%           5109           5109
stefanluptak                    121249     ±0.00%         121249         121249
qqwy_nx                     811508.05     ±0.13%         811185         815392

Comparison:
qqwy                                77
wanton72                            76 - 0.99x reduction count -1
100phlecs                          475 - 6.17x reduction count +398
wanton7                            896 - 11.64x reduction count +819
nikiiv                             324 - 4.21x reduction count +247
tomekowal                          325 - 4.22x reduction count +248
qqwy_okasaki_erlang               1414 - 18.36x reduction count +1337
qqwy_okasaki_amortized            1173 - 15.23x reduction count +1096
nikiiv2                            439 - 5.70x reduction count +362
qqwy_okasaki_constant             1139 - 14.79x reduction count +1062
eksperimental                     5109 - 66.35x reduction count +5032
stefanluptak                    121249 - 1574.66x reduction count +121172
qqwy_nx                     811508.05 - 10539.07x reduction count +811431.05

##### With input 10_000 #####
Name                             ips        average  deviation         median         99th %
wanton72                   5403.52 K       0.185 μs    ±38.37%       0.167 μs        0.29 μs
qqwy                       5305.83 K       0.188 μs    ±84.23%       0.167 μs        0.29 μs
100phlecs                  2157.44 K        0.46 μs  ±1217.80%        0.42 μs        0.58 μs
wanton7                     958.76 K        1.04 μs  ±1150.24%        0.92 μs        1.13 μs
nikiiv                      929.89 K        1.08 μs   ±668.00%           1 μs        1.29 μs
tomekowal                   678.60 K        1.47 μs   ±599.21%        1.04 μs        5.21 μs
qqwy_okasaki_erlang         471.32 K        2.12 μs   ±384.63%           2 μs        2.33 μs
qqwy_okasaki_amortized      421.06 K        2.37 μs   ±442.91%        2.29 μs        2.63 μs
qqwy_okasaki_constant       330.69 K        3.02 μs   ±502.10%        2.75 μs        3.83 μs
eksperimental               134.32 K        7.44 μs   ±234.34%        6.67 μs       38.67 μs
nikiiv2                      26.69 K       37.46 μs    ±61.26%          39 μs       51.71 μs
stefanluptak                  0.26 K     3853.09 μs     ±6.45%     3823.07 μs     4040.31 μs
qqwy_nx                    0.0564 K    17733.98 μs     ±5.05%    17346.58 μs    21416.13 μs

Comparison:
wanton72                   5403.52 K
qqwy                       5305.83 K - 1.02x slower +0.00341 μs
100phlecs                  2157.44 K - 2.50x slower +0.28 μs
wanton7                     958.76 K - 5.64x slower +0.86 μs
nikiiv                      929.89 K - 5.81x slower +0.89 μs
tomekowal                   678.60 K - 7.96x slower +1.29 μs
qqwy_okasaki_erlang         471.32 K - 11.46x slower +1.94 μs
qqwy_okasaki_amortized      421.06 K - 12.83x slower +2.19 μs
qqwy_okasaki_constant       330.69 K - 16.34x slower +2.84 μs
eksperimental               134.32 K - 40.23x slower +7.26 μs
nikiiv2                      26.69 K - 202.42x slower +37.28 μs
stefanluptak                  0.26 K - 20820.23x slower +3852.90 μs
qqwy_nx                    0.0564 K - 95825.89x slower +17733.80 μs

Memory usage statistics:

Name                           average  deviation         median         99th %
wanton72                          0 KB     ±0.00%           0 KB           0 KB
qqwy                              0 KB     ±0.00%           0 KB           0 KB
100phlecs                      3.96 KB     ±0.00%        3.96 KB        3.96 KB
wanton7                        0.92 KB     ±0.00%        0.92 KB        0.92 KB
nikiiv                        11.02 KB     ±0.00%       11.02 KB       11.02 KB
tomekowal                     11.02 KB     ±0.00%       11.02 KB       11.02 KB
qqwy_okasaki_erlang            7.24 KB     ±0.00%        7.24 KB        7.24 KB
qqwy_okasaki_amortized         7.48 KB     ±0.00%        7.48 KB        7.48 KB
qqwy_okasaki_constant          9.13 KB     ±0.00%        9.13 KB        9.13 KB
eksperimental                 46.35 KB     ±0.00%       46.35 KB       46.35 KB
nikiiv2                           0 KB     ±0.00%           0 KB           0 KB
stefanluptak                2967.06 KB     ±0.00%     2967.06 KB     2967.06 KB
qqwy_nx                   11293.81 KB     ±0.01%    11293.92 KB    11296.37 KB

Comparison:
wanton72                          0 KB
qqwy                              0 KB - 1.00x memory usage +0 KB
100phlecs                      3.96 KB - ∞ x memory usage +3.96 KB
wanton7                        0.92 KB - ∞ x memory usage +0.92 KB
nikiiv                        11.02 KB - ∞ x memory usage +11.02 KB
tomekowal                     11.02 KB - ∞ x memory usage +11.02 KB
qqwy_okasaki_erlang            7.24 KB - ∞ x memory usage +7.24 KB
qqwy_okasaki_amortized         7.48 KB - ∞ x memory usage +7.48 KB
qqwy_okasaki_constant          9.13 KB - ∞ x memory usage +9.13 KB
eksperimental                 46.35 KB - ∞ x memory usage +46.35 KB
nikiiv2                           0 KB - 1.00x memory usage +0 KB
stefanluptak                2967.06 KB - ∞ x memory usage +2967.06 KB
qqwy_nx                   11293.81 KB - ∞ x memory usage +11293.81 KB

Reduction count statistics:

Name                           average  deviation         median         99th %
wanton72                            61     ±0.00%             61             61
qqwy                                62     ±0.00%             62             62
100phlecs                          219     ±0.00%            219            219
wanton7                            706     ±0.00%            706            706
nikiiv                             101     ±0.00%            101            101
tomekowal                          102     ±0.00%            102            102
qqwy_okasaki_erlang                814     ±0.00%            814            814
qqwy_okasaki_amortized             620     ±0.00%            620            620
qqwy_okasaki_constant              772     ±0.00%            772            772
eksperimental                     3814     ±0.00%           3814           3814
nikiiv2                           3687     ±0.00%           3687           3687
stefanluptak                    923069     ±0.00%         923069         923069
qqwy_nx                     825682.49     ±0.19%         825310         830201

Comparison:
wanton72                            61
qqwy                                62 - 1.02x reduction count +1
100phlecs                          219 - 3.59x reduction count +158
wanton7                            706 - 11.57x reduction count +645
nikiiv                             101 - 1.66x reduction count +40
tomekowal                          102 - 1.67x reduction count +41
qqwy_okasaki_erlang                814 - 13.34x reduction count +753
qqwy_okasaki_amortized             620 - 10.16x reduction count +559
qqwy_okasaki_constant              772 - 12.66x reduction count +711
eksperimental                     3814 - 62.52x reduction count +3753
nikiiv2                           3687 - 60.44x reduction count +3626
stefanluptak                    923069 - 15132.28x reduction count +923008
qqwy_nx                     825682.49 - 13535.78x reduction count +825621.49

##### With input 1_000_000 #####
Name                             ips        average  deviation         median         99th %
wanton72                     13.49 K       74.14 μs     ±2.81%       74.13 μs       79.95 μs
qqwy                         12.67 K       78.90 μs     ±4.57%       79.33 μs       83.04 μs
tomekowal                     6.51 K      153.59 μs    ±37.01%      149.58 μs      224.16 μs
nikiiv                        6.51 K      153.62 μs    ±36.85%      152.21 μs      177.45 μs
100phlecs                     5.59 K      178.82 μs    ±33.56%      177.46 μs      198.25 μs
wanton7                       4.35 K      229.73 μs    ±30.08%      224.58 μs      310.48 μs
eksperimental                 1.73 K      577.66 μs    ±35.37%         497 μs     1121.22 μs
qqwy_okasaki_amortized        1.25 K      801.76 μs    ±17.34%      792.71 μs      907.51 μs
qqwy_okasaki_constant         1.11 K      900.52 μs    ±16.05%      889.00 μs     1001.37 μs
qqwy_okasaki_erlang           0.89 K     1120.28 μs    ±23.66%     1078.62 μs     1518.26 μs
nikiiv2                       0.21 K     4661.71 μs    ±16.51%     4206.83 μs     7317.24 μs
qqwy_nx                   0.00015 K  6882592.79 μs     ±0.00%  6882592.79 μs  6882592.79 μs
stefanluptak               0.00000 K336956857.20 μs     ±0.00%336956857.20 μs336956857.20 μs

Comparison:
wanton72                     13.49 K
qqwy                         12.67 K - 1.06x slower +4.76 μs
tomekowal                     6.51 K - 2.07x slower +79.44 μs
nikiiv                        6.51 K - 2.07x slower +79.48 μs
100phlecs                     5.59 K - 2.41x slower +104.68 μs
wanton7                       4.35 K - 3.10x slower +155.59 μs
eksperimental                 1.73 K - 7.79x slower +503.52 μs
qqwy_okasaki_amortized        1.25 K - 10.81x slower +727.62 μs
qqwy_okasaki_constant         1.11 K - 12.15x slower +826.38 μs
qqwy_okasaki_erlang           0.89 K - 15.11x slower +1046.14 μs
nikiiv2                       0.21 K - 62.88x slower +4587.57 μs
qqwy_nx                   0.00015 K - 92831.87x slower +6882518.65 μs
stefanluptak               0.00000 K - 4544847.56x slower +336956783.06 μs

Memory usage statistics:

Name                      Memory usage
wanton72                          0 MB
qqwy                              0 MB - 1.00x memory usage +0 MB
tomekowal                      0.40 MB - ∞ x memory usage +0.40 MB
nikiiv                         0.40 MB - ∞ x memory usage +0.40 MB
100phlecs                      1.18 MB - ∞ x memory usage +1.18 MB
wanton7                        0.30 MB - ∞ x memory usage +0.30 MB
eksperimental                  2.21 MB - ∞ x memory usage +2.21 MB
qqwy_okasaki_amortized         2.54 MB - ∞ x memory usage +2.54 MB
qqwy_okasaki_constant          2.88 MB - ∞ x memory usage +2.88 MB
qqwy_okasaki_erlang            2.37 MB - ∞ x memory usage +2.37 MB
nikiiv2                           0 MB - 1.00x memory usage +0 MB
qqwy_nx                    3235.34 MB - ∞ x memory usage +3235.34 MB
stefanluptak              148456.41 MB - ∞ x memory usage +148456.41 MB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           average  deviation         median         99th %
wanton72                       19.56 K     ±0.00%        19.56 K        19.56 K
qqwy                           19.56 K     ±0.00%        19.56 K        19.56 K
tomekowal                      29.34 K     ±0.00%        29.34 K        29.34 K
nikiiv                         29.34 K     ±0.00%        29.34 K        29.34 K
100phlecs                      79.94 K     ±0.32%        79.97 K        80.73 K
wanton7                       163.27 K     ±0.00%       163.27 K       163.27 K
eksperimental                 239.54 K     ±0.21%       239.65 K       240.88 K
qqwy_okasaki_amortized        218.31 K     ±0.10%       218.28 K       218.95 K
qqwy_okasaki_constant         272.71 K     ±0.16%       272.56 K       274.43 K
qqwy_okasaki_erlang           265.57 K     ±0.08%       265.45 K       266.45 K
nikiiv2                        99.64 K     ±0.88%        99.53 K       106.20 K
qqwy_nx                   213890.61 K     ±0.00%    213890.61 K    213890.61 K
stefanluptak             49206910.82 K     ±0.00%  49206910.82 K  49206910.82 K

Comparison:
wanton72                       19.56 K
qqwy                           19.56 K - 1.00x reduction count +0.00100 K
tomekowal                      29.34 K - 1.50x reduction count +9.78 K
nikiiv                         29.34 K - 1.50x reduction count +9.78 K
100phlecs                      79.94 K - 4.09x reduction count +60.38 K
wanton7                       163.27 K - 8.35x reduction count +143.72 K
eksperimental                 239.54 K - 12.25x reduction count +219.98 K
qqwy_okasaki_amortized        218.31 K - 11.16x reduction count +198.75 K
qqwy_okasaki_constant         272.71 K - 13.94x reduction count +253.15 K
qqwy_okasaki_erlang           265.57 K - 13.58x reduction count +246.01 K
nikiiv2                        99.64 K - 5.09x reduction count +80.08 K
qqwy_nx                   213890.61 K - 10936.78x reduction count +213871.05 K
stefanluptak             49206910.82 K - 2516076.64x reduction count +49206891.26 K

100phlecs · September 6, 2023, 5:09pm

This solution reminds me of how imperative languages may approach the problem–by maintaining “pointers” or indices within the array. It seems you found the most elegant solution to the problem. It is clever to use pattern matching as the “pointer” effectively, thanks for sharing your solution

One could entertain converting the list into a map list |> Enum.with_index |> Map.new to mimic pointers with two indices, one for the start of the window and one for the end.

Though looking at the Big O for map access the solution would be O(n * log n) in that case, so it won’t work. I also think the code would be more verbose going down this path.