Improving a slow solution for Longest Collatz Sequence

thiagoa · July 13, 2017, 5:09pm

Hi all,

I tried to code a solution to the following problem in Elixir: https://projecteuler.net/problem=14. Given an input n, find the Collatz sequence for that number - for an even number, the next number is n / 2; for an odd number, it is n * 3 + 1. That sequence eventually halts when hitting the number 1. I’m looping from 1 to 1,000.000 and figuring what number generates the longest sequence.

The problem is, the Elixir solution seems too slow when compared to the Ruby one (4 times slower).

Since both use the same algorithm, I wonder what optimizations I could apply to make my Elixir solution faster. I suspect Elixir is not the right tool for this job, but should it really be 4 times slower compared to the Ruby one?

Elixir solution:

$ elixir collatz.ex
Number with longest Collatz sequence: 837799
Longest sequence length: 525
elixir collatz.ex  2.89s user 0.15s system 99% cpu 3.071 total

Code:

defmodule NextCollatz do
  def call(n, cache), do: call(n, 0, cache)
  def call(1, length, _), do: length + 1
  def call(n, length, cache), do: length_for(n, length, cache, cache[n])

  defp length_for(n, length, cache, nil), do: call(next_for(n), length, cache) + 1
  defp length_for(_, _, _, cache_value), do: cache_value

  defp next_for(n) when rem(n, 2) == 0, do: div(n, 2)
  defp next_for(n) when rem(n, 2) == 1, do: n * 3 + 1
end

defmodule LongestCollatzSequence do
  def call(min, limit), do: call(min, %{}, limit, {0, 0})
  def call(n, _, limit, result) when n > limit, do: result
  def call(n, cache, limit, result) do
    gather_iteration_result(n, cache, limit, result, NextCollatz.call(n, cache))
  end

  def gather_iteration_result(n, cache, limit, result, length) do
    call(n + 1, Map.put_new(cache, n, length), limit, max(n, length, result))
  end

  defp max(n, length, {_, max_length}) when length > max_length, do: {n, length}
  defp max(_, _, result), do: result
end

{max_n, max_length} = LongestCollatzSequence.call(1, 1_000_000)

IO.puts "Number with longest Collatz sequence: #{max_n}"
IO.puts "Longest sequence length: #{max_length}"

Ruby solution

$ ruby collatz.rb
Number with longest Collatz sequence: 837799
Longest sequence length: 525
ruby collatz.rb  0.78s user 0.01s system 95% cpu 0.827 total

URL: https://gist.github.com/thiagoa/6118393c49242d517a4a64024f74c1f7

OvermindDL1 · July 13, 2017, 5:49pm

Well the cross-module call will have a definite overhead.

Put it all in one module, and make as much of it defp as you can, that is the first thing to do.

sasajuric · July 13, 2017, 9:46pm

Sometimes an equivalent Ruby code will be faster than its Elixir counterparts. Some reasons I’ve seen are:

Some Ruby functions are powered by C, whereas Elixir functions are not.
There is some cost to pay for immutability.
Integer math is not really fast in Erlang.

In my experience these things rarely matter in the grand scheme of things. In 7 years of working with Erlang in production I didn’t have much needs to move outside of Erlang, the most frequent (but still very rare) example being encoding/decoding of large jsons.

In any case, in these synthetic examples Ruby (and many other langs) might win for some naive implementation, but before discarding Elixir, there are two options to be considered:

Algoritjmical optimization
Technical optimization

If you can do something with 1, then you might get huge improvements to the point where the diff between Elixir and X is insignificant. I’m not familiar with Collatz seq, so I don’t know if there’s some improvement that can be done in the algorithm. So I tried a couple of technical tricks. They are quite ugly, but they did help.

The first thing to do is to not use the time OS command, because that also measures the boot time of the VM, as well as compilation of your code. Instead use :timer.tc to measure the running time of your code. That immediately shaved off half a second on my machine.

The next thing I did is very ugly. I reached for the universally hated process dictionary. I really wouldn’t recommend that in real life, but it did help tremendously here, cutting the running time in half. Finally, for a bit of additional gain, I used Bitwise instead of rem and div. This gained additional ~100msec on my machine.

The final version takes about 900 msec (originally it was about 3 sec). The ruby version takes about 1200 msec on my machine (it’s ruby 2.0). Perhaps there are better (and less ugly) approaches, but this is the best I came up with after a quick hack session.

Before showing the code I want to touch on this:

This is not a problem, it is a property If you want to crunch a lot of numbers in a one-off computation as fast as possible, Elixir/Erlang are bad choices to begin with, as is Ruby for that matter. I wouldn’t look far beyond C/C++ for those kinds of jobs.

The real problem is to extrapolate that based on this naive example Elixir is always going to be slower than Ruby (or any other X) or worse than Ruby (raw CPU speed is not the only property in software, and in some types of software, such as backend systems, it’s often not the most important one).

In any case, here’s my very hacky take on this. It’s not really tested, and I wouldn’t recommend using these techniques in production

defmodule Collatz do
  import Bitwise

  def max_len(range) do
    {max, max_len} =
      Enum.reduce(range, {0, 0},
        fn n, {_, max_len} = acc ->
          len = len(n)
          Process.put(n, len)
          if len > max_len, do: {n, len}, else: acc
        end
      )

    %{max: max, max_len: max_len}
  end

  defp len(value) do
    case Process.get(value) do
      nil ->
        case next(value) do
          1 -> 2
          next -> 1 + len(next)
        end
      len -> len
    end
  end

  defp next(x), do:
    if band(x, 1) == 0, do: x >>> 1, else: 3 * x + 1
end

{time, result} = :timer.tc(fn -> Collatz.max_len(1..1_000_000) end)
IO.puts "Time: #{div(time, 1000)}ms"
IO.inspect result

$ elixir test.ex
Time: 864ms
%{max: 837799, max_len: 525}

OvermindDL1 · July 13, 2017, 10:11pm

And calling C/C++/Rust via NIF’s or Ports on the BEAM is super-easy unlike Python/Ruby addons.

thiagoa · July 13, 2017, 10:34pm

@sasajuric Thanks so much for the answer, the detailed explanation and tricks. I suspected Elixir would be slow, but not so much slower than Ruby. That was my main concern: the Elixir version was as naive as Ruby’s (same algorithm), but the results were so different Your explanations about that make a lot of sense.

By “problem”, I did not want to imply it in the general sense, but just the comparison between Collatz implementations, which as you mentioned, are not representative of anything important. It was just out of raw curiosity. My co-worker did this problem in Fortran, and got less than 10 milliseconds as a result!

The process dictionary and Bitwise module are really great tricks, and I’ll take your word to not use them in production. And seems that I did not consider the VM’s warm-up, ha!

OvermindDL1 · July 13, 2017, 10:37pm

That slow eh, sounds like a challenge.

/me is certain that problem could be done much faster considering how easily parallizable via CPU/GPU intrinsics with simple C++ it looks…

/me coughs

Qqwy · July 13, 2017, 11:37pm

I wonder how fast an implementation of the inverse algorithm might be: performing a depth-first-search on the Collatz graph and keeping track of the maximum encountered tree height.

peerreynders · July 14, 2017, 12:48am

$ elixir collatzSJ01.ex
Time: 949ms
{837799, 525}

A tiny bit of improvement by caching most values as early as possible to avoid most recalculations.

$ elixir collatzSJ03.ex
Time: 788ms
{837799, 525}
$

.

defmodule Collatz do
  use Bitwise

  # file: collatzSJ03.ex
  defp next(x) when (band x, 1) == 0, do:  x >>> 1
  defp next(x), do: 3 * x + 1

  defp calc_len(value, limit) do
    length =
      case (next value) do
        1 -> 2
        next_val -> 1 + (val_len next_val, limit)
      end

    cond do
      value > limit ->
        length # don't spend time storing this value
      true ->
        Process.put value, length # store what we didn't have before
        length
    end
  end

  defp val_len(value, limit) do
    case (Process.get value) do
      nil    -> calc_len value, limit
      length -> length
    end
  end

  defp keep_max(value, {_, max_len, limit} = acc) do
    length = val_len value, limit
    cond do
      length > max_len ->
        {value, length, limit}
      true ->
        acc
    end
  end

  def max_len(limit) do
    {value, length, _} = Enum.reduce limit..1, {0, 0, limit}, &keep_max/2
    {value, length}
  end
end

{time, result} = :timer.tc(fn -> Collatz.max_len(1_000_000) end)
IO.puts "Time: #{div(time, 1000)}ms"
IO.inspect result

By comparison the version with Map:

$ elixir collatzSJ04.ex
Time: 1897ms
{837799, 525}
$

.

defmodule Collatz do
  use Bitwise

  # file: collatzSJ04.ex
  defp next(x) when (band x, 1) == 0, do:  x >>> 1
  defp next(x), do: 3 * x + 1

  defp calc_len(value, limit, kv) do
    length_kv =
      case (next value) do
        1 ->
          {2, kv}
        next_val ->
          {next_len, new_kv} = val_len next_val, limit, kv
          {1 + next_len, new_kv}
      end

    case length_kv do
      _ when value > limit ->
        length_kv # don't spend time storing this value
      {length, kv} ->
        {length, (Map.put kv, value, length)} # store what we didn't have before
    end
  end

  defp val_len(value, limit, kv) do
    case kv[value] do
      nil    -> calc_len value, limit, kv
      length -> {length, kv}
    end
  end

  defp keep_max(value, {{_, max_len, limit} = last, kv}) do
    {length, new_kv} = val_len value, limit, kv
    cond do
      length > max_len ->
        {{value, length, limit}, new_kv}
      true ->
        {last, new_kv}
    end
  end

  def max_len(limit) do
    {{value, length, _}, _} = Enum.reduce limit..1, {{0, 0, limit}, %{}}, &keep_max/2
    {value, length}
  end
end

{time, result} = :timer.tc(fn -> Collatz.max_len(1_000_000) end)
IO.puts "Time: #{div(time, 1000)}ms"
IO.inspect result

sasajuric · July 14, 2017, 5:47am

Personally, I can only reason about speed in a real-world context. For a piece of information which needs to be computed only once in the entire history of humanity, I’d say that 4 seconds is pretty acceptable running time. Even if the program took the whole day to compute it, I’d still be ok with that