# Optimizing nested recursion

I am new to elixir and am trying to replicate some nested looping from F# and C++. In those languages I used mutable arrays and with Elixir I am trying to replicate a 5 level deep nested recursive call.

I have some code that gives me the same count but its considerably slower. Any suggestions on how I can write this with improved performance and even fewer lines of code would be appreciated.

The test function at the end is what i use to execute the recursion. On my Mac it takes about 900 milliseconds.

``````defmodule Demo.Recursion do
use Bitwise

def get_new_number(index, current_number, number_list), do: (Enum.at(number_list, index) ||| current_number)

def fifth_loop(number, start, counter, number_list) do
items = for e <- start ..counter, do: get_new_number(e, number, number_list)
items
end

def fourth_loop(number, start, fourth_count, fifth_count, number_list) do
items =
for d <- start ..fourth_count do
new_number = get_new_number(d, number, number_list)
fifth_loop(new_number, d + 1, fifth_count, number_list)
end
items
end

def third_loop(number, start, third_count, fourth_count, fifth_count, number_list) do
items =
for c <- start ..third_count do
new_number = get_new_number(c, number, number_list)
fourth_loop(new_number, c + 1, fourth_count, fifth_count, number_list )
end
items
end

def second_loop(number, start, second_count, third_count, fourth_count, fifth_count, number_list ) do
items =
for b <- start ..second_count do
new_number = get_new_number(b, number, number_list)
third_loop(new_number, b + 1, third_count, fourth_count, fifth_count, number_list)
end
items
end

def first_loop(first_count, second_count, third_count, fourth_count, fifth_count, number_list) do
items =
for a <- 0 .. first_count do
number = Enum.at(number_list, a)
second_loop(number, a + 1, second_count, third_count, fourth_count, fifth_count, number_list)
end
|> List.flatten
IO.inspect length(items)
end

def test_recursion() do
new_list =  for i <- 1..48 do i end |> List.flatten
{time, res} = :timer.tc(fn -> first_loop(43, 44, 45, 46, 47, new_list) end)
time / 1000
end
end
``````

Any help is appreciated.

Thanks,

Could you explain what it is that you’re trying to do, exactly?

1 Like

You’re not doing any recursion. You’ve got nested looping logic, sure, but it’s not the same thing. That can be vasty simplified with multiple generators in the for comprehension. I believe this is equivalent:

``````defmodule Demo.Recursion do
use Bitwise

def compound_loop(first_count, second_count, third_count, fourth_count, fifth_count, number_tuple) do
for a <- 0..first_count, b <- (a+1)..second_count, c <- (b+1)..third_count, d <- (c+1)..fourth_count, e <- (d+1)..fifth_count do
elem(number_tuple, a) ||| elem(number_tuple, b) ||| elem(number_tuple, c) ||| elem(number_tuple, d) ||| elem(number_tuple, e)
end
end

def test_recursion() do
new_list = 1..48 |> Enum.to_list |> List.to_tuple
{time, res} = :timer.tc(fn -> compound_loop(43, 44, 45, 46, 47, new_list) end)
time / 1000
end
end
``````

To get better performance, you’re probably going to want to do binary comprehensions instead of list comprehensions. I don’t have experience with those at the tips of my fingers, so this is as far as I can take it for now.

Edited to add: Oh! Using a tuple instead of a `new_list` cuts execution to about 1/3 on my machine

Thank you for the code example this is definitely helpful. Not only way more concise and cleaner but it does take my time down from 900 millseconds to about 400 milliseconds.

I know nothing of what binary comprehension is but I will go look into it and see how close I can get this to C++ speed.

As gonz was asking what I was trying to do. This looping logic was originally used for old odds calculator logic but I may use it for other upcoming things as well. And I’ve had other projects where I used nested loops and now I know better how to approach it with this language.

Again this was very helpful Looking at some examples regarding binary comprehension. I attempted the following:

items = for x <- 1…48, into: <<>>, do: <<x::size(8)>>

And then I access them by using :binary.at(items, x) .

Is that what you were referring to buy putting the numbers in binary ? Or were you thinking of a different approach with regard to binary comprehension?

When i use the above code I basically get the same time as your example.

In your code you do really have nested recursion but you don’t explicitly see it. It is how `for` with multiple generators is implemented 1 Like

Does that also mean the code uses mutable values because somewhere down in the technology stack (at least at the hardware level, perhaps earlier) that’s how it’s implemented? My idea about a binary comprehension changed once I thought to use a tuple for your new_list. I don’t see a cheap and easy way to squeeze more performance out of this code. It’s so abstract. I think to go any further you’ll have to look to the algorithms you’re using, which I do not understand. Perhaps you can special case some particular values of some subset of your offsets or precompute some things with a lookup table?

1 Like

In the original version of this code mutable arrays were used. I am not sure if elixir does something with that under the hood.

I tried :array, tuple, binary list and list. The tuple option you provided is the fastest.

What’s funny is the code I posted is a about 40 milliseconds or so faster than yours when i switch to using tuples. Your’s is obviously way more terse, easier to read and fewer lines.

My guess is you are correct regarding not being able to squeeze any more performance out of this looping. I think ill finish the rest of the code and see what the ultimate side by side comparisons are.

Thanks for the feedback.

+/- 40 ms was about what I was seeing when I was making sure my refactoring was equivalent. If your original implementation is consistently faster, I have no explanation.

Yes, when you first said C++ I expected the original version to use mutable arrays. Due to the nature of the BEAM memory model, these types of algorithms are not what it is optimized for. If you get good enough performance for your needs, then great! But, as much as I enjoy Elixir, it may not be the best choice of language for your project. That said, you could still use Elixir as a sort of backplane that organizes calls to a C++ program via ports or nifs to do the heavy number crunching.

1 Like

Well, down in the hardware there is mutable data of course which the BEAM uses. But when handling erlang data terms it does NOT mutate things, it is immutable all the way down deep into the VM. The BEAM actually uses the fact that erlang data is immutable to optimise its handling of it, for example the GC knows that things are immutable and doesn’t need to take care of forward pointers.

3 Likes

Yes I think the next thing I am going to look at is calling that C/C++ library from Elixir. I have read a bit on NIF so maybe that would work as a good alternate option.

This implementation seems to bring execution time down to about 1/5th of the original, and is about 1.7x faster than the `compound_loop` version:

``````defmodule Demo.Recursion2 do
use Bitwise

def get_new_number(current_number, next_number), do: next_number ||| current_number

# at each level of descent, when reaching the limit, return results
defp do_loop(_number, [count|_count_list], _number_list, acc) when count < 0, do: acc
# at the deepest level of descent, just calculate new numbers and accumulate
defp do_loop(number, [count], [next_number|number_list], acc) do
new_number = get_new_number(number, next_number)
do_loop(number, [count - 1], number_list, [new_number|acc])
end
# at the top level, just pick numbers from the list and descend deeper
defp do_loop(nil, [count|count_list], [next_number|number_list], acc) do
new_count_list = Enum.map(count_list, fn c -> c - 1 end)
result = do_loop(next_number, new_count_list, number_list, [])
do_loop(nil, [count - 1|new_count_list], number_list, [result|acc])
end
# at intermediate levels, calculate a new number based on current + next,
# then descend deeper
defp do_loop(number, [count|count_list], [next_number|number_list], acc) do
new_number = get_new_number(number, next_number)
new_count_list = Enum.map(count_list, fn c -> c - 1 end)
result = do_loop(new_number, new_count_list, number_list, [])
do_loop(number, [count - 1|new_count_list], number_list, [result|acc])
end

def first_loop(count_list, number_list) do
items =
do_loop(nil, count_list, number_list, [])
|> List.flatten
IO.inspect length(items)
end

def test_recursion() do
number_list = for i <- 1..48, do: i
count_list = for i <- 43..47, do: i
{time, res} = :timer.tc(fn -> first_loop(count_list, number_list) end)
time / 1000
end
end
``````

One of the issues in the original code was picking out elements from the full list all of the time; this becomes expensive when you do it a lot… Thus the code above will pick off the top element and recurse over the tail of the list instead, which is a much faster way.

I’d have liked to express it using `Enum.reduce` or similar, but the requirement to do sub-recursion on the tail of the original list seems to make that a bit complicated… 1 Like

Make sure a NIF call never takes more than 1ms! Otherwise use a Port instead. This is great.

Using the “head” on on the current and having a subset for the second list is something i would never have thought of. Its even slightly faster than the Tuple version which was already pretty good.

Thanks for posting this. All of these examples will help me going forward. Head / tail destructuring of lists, and keeping the result in an accumulator, are two things that are very common when you need to “roll your own” recursion over lists, as opposed to using the Enum functions It’s good to play around with it, as inevitably you do run into situations where you need it.

That being said, trying the NIF experiment would also be a great learning experience I’m sure Both that, and the more generally useful Port option, are good to have in your toolbox.

1 Like

Or use dirty schedulers ;).

Which are not enabled by default, for good reason. It is better to just don’t. ^.^

1 Like

They will be on OTP 20 Is it actually stable now?! I remember there being a host of issues on initial implementation years ago (when I kept up with it… >.>), especially on Windows I think (but then again HIPE does not even work on Windows…).

Quite useful though if the issues are solved. 