Optimizing nested recursion

mru22 · December 4, 2016, 12:25am

I am new to elixir and am trying to replicate some nested looping from F# and C++. In those languages I used mutable arrays and with Elixir I am trying to replicate a 5 level deep nested recursive call.

I have some code that gives me the same count but its considerably slower. Any suggestions on how I can write this with improved performance and even fewer lines of code would be appreciated.

The test function at the end is what i use to execute the recursion. On my Mac it takes about 900 milliseconds.

defmodule Demo.Recursion do
  use Bitwise

  def get_new_number(index, current_number, number_list), do: (Enum.at(number_list, index) ||| current_number)

  def fifth_loop(number, start, counter, number_list) do
    items = for e <- start ..counter, do: get_new_number(e, number, number_list)
    items
  end

  def fourth_loop(number, start, fourth_count, fifth_count, number_list) do
    items =
        for d <- start ..fourth_count do
          new_number = get_new_number(d, number, number_list)
          fifth_loop(new_number, d + 1, fifth_count, number_list)
        end
    items
  end

  def third_loop(number, start, third_count, fourth_count, fifth_count, number_list) do
    items =
        for c <- start ..third_count do
          new_number = get_new_number(c, number, number_list)
          fourth_loop(new_number, c + 1, fourth_count, fifth_count, number_list )
        end
    items
  end

  def second_loop(number, start, second_count, third_count, fourth_count, fifth_count, number_list ) do
    items =
        for b <- start ..second_count do
          new_number = get_new_number(b, number, number_list)
          third_loop(new_number, b + 1, third_count, fourth_count, fifth_count, number_list)
        end
    items
  end

  def first_loop(first_count, second_count, third_count, fourth_count, fifth_count, number_list) do
    items =
      for a <- 0 .. first_count do
        number = Enum.at(number_list, a)
        second_loop(number, a + 1, second_count, third_count, fourth_count, fifth_count, number_list)
      end
      |> List.flatten
    IO.inspect length(items)
  end

  def test_recursion() do
    new_list =  for i <- 1..48 do i end |> List.flatten
    {time, res} = :timer.tc(fn -> first_loop(43, 44, 45, 46, 47, new_list) end)
    time / 1000
  end
end

Any help is appreciated.

Thanks,

gon782 · December 4, 2016, 12:40am

Could you explain what it is that you’re trying to do, exactly?

gregvaughn · December 4, 2016, 1:29am

You’re not doing any recursion. You’ve got nested looping logic, sure, but it’s not the same thing. That can be vasty simplified with multiple generators in the for comprehension. I believe this is equivalent:

defmodule Demo.Recursion do
  use Bitwise

  def compound_loop(first_count, second_count, third_count, fourth_count, fifth_count, number_tuple) do
    for a <- 0..first_count, b <- (a+1)..second_count, c <- (b+1)..third_count, d <- (c+1)..fourth_count, e <- (d+1)..fifth_count do
      elem(number_tuple, a) ||| elem(number_tuple, b) ||| elem(number_tuple, c) ||| elem(number_tuple, d) ||| elem(number_tuple, e)
    end
  end

  def test_recursion() do
    new_list = 1..48 |> Enum.to_list |> List.to_tuple
    {time, res} = :timer.tc(fn -> compound_loop(43, 44, 45, 46, 47, new_list) end)
    time / 1000
  end
end

To get better performance, you’re probably going to want to do binary comprehensions instead of list comprehensions. I don’t have experience with those at the tips of my fingers, so this is as far as I can take it for now.

Edited to add: Oh! Using a tuple instead of a new_list cuts execution to about 1/3 on my machine

mru22 · December 4, 2016, 5:20am

Thank you for the code example this is definitely helpful. Not only way more concise and cleaner but it does take my time down from 900 millseconds to about 400 milliseconds.

I know nothing of what binary comprehension is but I will go look into it and see how close I can get this to C++ speed.

As gonz was asking what I was trying to do. This looping logic was originally used for old odds calculator logic but I may use it for other upcoming things as well. And I’ve had other projects where I used nested loops and now I know better how to approach it with this language.

Again this was very helpful

mru22 · December 4, 2016, 2:45pm

Looking at some examples regarding binary comprehension. I attempted the following:

items = for x <- 1…48, into: <<>>, do: <<x::size(8)>>

And then I access them by using :binary.at(items, x) .

Is that what you were referring to buy putting the numbers in binary ? Or were you thinking of a different approach with regard to binary comprehension?

When i use the above code I basically get the same time as your example.

rvirding · December 4, 2016, 10:42pm

In your code you do really have nested recursion but you don’t explicitly see it. It is how for with multiple generators is implemented

gregvaughn · December 5, 2016, 3:27am

Does that also mean the code uses mutable values because somewhere down in the technology stack (at least at the hardware level, perhaps earlier) that’s how it’s implemented?

gregvaughn · December 5, 2016, 3:33am

My idea about a binary comprehension changed once I thought to use a tuple for your new_list. I don’t see a cheap and easy way to squeeze more performance out of this code. It’s so abstract. I think to go any further you’ll have to look to the algorithms you’re using, which I do not understand. Perhaps you can special case some particular values of some subset of your offsets or precompute some things with a lookup table?

mru22 · December 5, 2016, 3:54am

In the original version of this code mutable arrays were used. I am not sure if elixir does something with that under the hood.

I tried :array, tuple, binary list and list. The tuple option you provided is the fastest.

What’s funny is the code I posted is a about 40 milliseconds or so faster than yours when i switch to using tuples. Your’s is obviously way more terse, easier to read and fewer lines.

My guess is you are correct regarding not being able to squeeze any more performance out of this looping. I think ill finish the rest of the code and see what the ultimate side by side comparisons are.

Thanks for the feedback.

gregvaughn · December 5, 2016, 4:08am

+/- 40 ms was about what I was seeing when I was making sure my refactoring was equivalent. If your original implementation is consistently faster, I have no explanation.

Yes, when you first said C++ I expected the original version to use mutable arrays. Due to the nature of the BEAM memory model, these types of algorithms are not what it is optimized for. If you get good enough performance for your needs, then great! But, as much as I enjoy Elixir, it may not be the best choice of language for your project. That said, you could still use Elixir as a sort of backplane that organizes calls to a C++ program via ports or nifs to do the heavy number crunching.

rvirding · December 5, 2016, 11:07am

Well, down in the hardware there is mutable data of course which the BEAM uses. But when handling erlang data terms it does NOT mutate things, it is immutable all the way down deep into the VM. The BEAM actually uses the fact that erlang data is immutable to optimise its handling of it, for example the GC knows that things are immutable and doesn’t need to take care of forward pointers.

mru22 · December 5, 2016, 2:08pm

Yes I think the next thing I am going to look at is calling that C/C++ library from Elixir. I have read a bit on NIF so maybe that would work as a good alternate option.

jwarlander · December 5, 2016, 2:22pm

This implementation seems to bring execution time down to about 1/5th of the original, and is about 1.7x faster than the compound_loop version:

defmodule Demo.Recursion2 do
  use Bitwise

  def get_new_number(current_number, next_number), do: next_number ||| current_number

  # at each level of descent, when reaching the limit, return results
  defp do_loop(_number, [count|_count_list], _number_list, acc) when count < 0, do: acc
  # at the deepest level of descent, just calculate new numbers and accumulate
  defp do_loop(number, [count], [next_number|number_list], acc) do
    new_number = get_new_number(number, next_number)
    do_loop(number, [count - 1], number_list, [new_number|acc])
  end
  # at the top level, just pick numbers from the list and descend deeper
  defp do_loop(nil, [count|count_list], [next_number|number_list], acc) do
    new_count_list = Enum.map(count_list, fn c -> c - 1 end)
    result = do_loop(next_number, new_count_list, number_list, [])
    do_loop(nil, [count - 1|new_count_list], number_list, [result|acc])
  end
  # at intermediate levels, calculate a new number based on current + next,
  # then descend deeper
  defp do_loop(number, [count|count_list], [next_number|number_list], acc) do
    new_number = get_new_number(number, next_number)
    new_count_list = Enum.map(count_list, fn c -> c - 1 end)
    result = do_loop(new_number, new_count_list, number_list, [])
    do_loop(number, [count - 1|new_count_list], number_list, [result|acc])
  end

  def first_loop(count_list, number_list) do
    items =
      do_loop(nil, count_list, number_list, [])
      |> List.flatten
    IO.inspect length(items)
  end

  def test_recursion() do
    number_list = for i <- 1..48, do: i
    count_list = for i <- 43..47, do: i
    {time, res} = :timer.tc(fn -> first_loop(count_list, number_list) end)
    time / 1000
  end
end

One of the issues in the original code was picking out elements from the full list all of the time; this becomes expensive when you do it a lot… Thus the code above will pick off the top element and recurse over the tail of the list instead, which is a much faster way.

I’d have liked to express it using Enum.reduce or similar, but the requirement to do sub-recursion on the tail of the original list seems to make that a bit complicated…

OvermindDL1 · December 5, 2016, 4:28pm

Make sure a NIF call never takes more than 1ms! Otherwise use a Port instead.

mru22 · December 6, 2016, 12:55am

This is great.

Using the “head” on on the current and having a subset for the second list is something i would never have thought of. Its even slightly faster than the Tuple version which was already pretty good.

Thanks for posting this. All of these examples will help me going forward.

jwarlander · December 6, 2016, 7:16am

Head / tail destructuring of lists, and keeping the result in an accumulator, are two things that are very common when you need to “roll your own” recursion over lists, as opposed to using the Enum functions It’s good to play around with it, as inevitably you do run into situations where you need it.

That being said, trying the NIF experiment would also be a great learning experience I’m sure Both that, and the more generally useful Port option, are good to have in your toolbox.

Good luck with your Elixir adventures!

hazardfn · December 7, 2016, 2:55pm

Or use dirty schedulers ;).

OvermindDL1 · December 7, 2016, 4:05pm

Which are not enabled by default, for good reason.

It is better to just don’t. ^.^

michalmuskala · December 7, 2016, 4:42pm

They will be on OTP 20

OvermindDL1 · December 7, 2016, 5:54pm

Is it actually stable now?! I remember there being a host of issues on initial implementation years ago (when I kept up with it… >.>), especially on Windows I think (but then again HIPE does not even work on Windows…).

Quite useful though if the issues are solved.