Is it possible to build up data structures tail-recursively?

English3000 · June 22, 2018, 10:30pm

For context, check out this blog post.

To summarize, I tried building a BST using tail-recursion only to realize that when children are modified, their parent node’s reference will point to the older version.

While body-recursion allows for very simple generators, it comes at the cost of greater call stack usage. As a result, a tail-call optimized solution would reduce auxiliary space complexity. But given the immutability of data in Elixir, I’m wondering whether such an optimization is possible–and if so, some examples would be much appreciated!

Thanks!

ericmj · June 22, 2018, 10:53pm

Is the optimization necessary and is it really an optimization? Recursing a 100 times on a function with a reasonable stack size is no problem in Elixir, this means you can do operations on a BST with 2^100 elements without much cost.

English3000 · June 23, 2018, 4:43am

Just came upon this article again.

Sounds like TCO really is best for primitive types, and doesn’t make much difference for complex ones because the cost of copying all of a larger data structure across stack frames (as opposed to increasingly smaller pieces of it with body recursion) offsets any optimizations.

peerreynders · June 23, 2018, 4:12pm

When converting from body recursion to tail recursion you essentially are taking responsibility for managing “your own data stack”. In most functional programming languages the head of a list can be added or removed fairly cheaply - making lists excellent stacks.

Furthermore the BEAM performs “last call optimization”. To some, “tail call optimization” may imply recursive calls within the same function. The BEAM doesn’t allocate a new stack frame even when one function’s “tail call” invokes an entirely different function. So mutual recursion can also be optimized.

With a Tree, last call optimization might look like this:

defmodule Tree do
  defstruct value: nil, left: :leaf, right: :leaf

  def new,
    do: :leaf

  def new(value),
    do: %Tree{value: value}

  def insert(tree, value),
    do: insert(tree, value, [])

  defp insert(:leaf, value, ops) do
    insert_ops(new(value), ops)
  end
  defp insert(%Tree{value: value, left: left, right: right} = tree, new_value, ops) do
    cond do
      value < new_value ->
        insert(right, new_value, [{:right, tree} | ops])
      true ->
        insert(left, new_value, [{:left, tree} | ops])
    end
  end

  defp insert_ops(tree, []),
    do: tree
  defp insert_ops(subtree, [{side, tree} | rest]),
    do: insert_ops(Map.put(tree, side, subtree), rest)

  def traverse(tree, acc, fun),
    do: traverse(tree, acc, fun, [])

  defp traverse(:leaf, acc, _fun, []),
    do: acc
  defp traverse(:leaf, acc, fun, [{right,value} | rest]),
    do: traverse(right, fun.(value, acc), fun, rest)
  defp traverse(%Tree{value: value, left: left, right: right}, acc, fun, ops),
    do: traverse(left, acc, fun, [{right, value} | ops])

  def to_list(tree),
    do: :lists.reverse(Tree.traverse(tree, [], &([&1|&2])))

  def from_enum(enum),
    do: Enum.reduce(enum, Tree.new(), &(Tree.insert(&2,&1)))

end

show = fn tree ->
  IO.inspect(tree)
  IO.inspect(Tree.to_list(tree))
end
show_from_enum = fn enum ->
  show.(Tree.from_enum(enum))
end

show.(Tree.new())
show.(Tree.new(1))
show_from_enum.([])
show_from_enum.([1])
show_from_enum.(1..9)
show_from_enum.(9..1)
show_from_enum.([8,4,12,2,6,10,14,1,3,5,7,9,11,13,15])

Allocating a stack frame is likely a highly optimized BEAM level operation and the memory cost related to how much data is captured in the frame. With tail calls you can be very exact what data needs to be captured but there is some overhead (in complexity and reductions) to explicitly managing that captured data. So the tradeoffs will depend a lot on the details of the particular use case.

English3000 · June 23, 2018, 7:56pm

Alright, I’ll just analyze the auxiliary space complexity of each working implementation here:

Given the tree–

#   4
#  / \
# 2   8

–insert a new value, 1, using both body and tail recursion. Determine for each the time and auxiliary space complexity of this operation.

body recursion
BodyTree.insert(tree, 1)

# STACK FRAME 0
# Aux Space: 0 + other stack frames aux space
# Time: O(1) + other stack frames time
def insert(tree = tree, data = 1) do
  if 1 < 4, do: %{tree | left: insert(tree.left, 1)} # ...
end
# STACK FRAME 1
# Aux Space: theta(n/2) or O(n - 1) [tree branch] + other stack frames aux space
# Time: O(1) + other stack frames time
def insert(tree = tree.left, data = 1) do
  if 1 < 2, do: %{tree | left: insert(tree.left, 1)} # ...
end
# STACK FRAME 2
# Aux Space: O(1)
# Time: O(1)
def insert(nil = tree.left.left, 1), do: new(1)

In all, inserting into a body recursive BST could take at worst O(n(n-1)/2) auxiliary space if all nodes are along one branch and at worst O(n) time. On average, auxiliary space should be theta(n/2 * (n-1)/2) for a balanced tree and time should be theta(log 2 n).

tail recursion
TailTree.insert(tree, 1)

# STACK FRAME 0.0
# Aux Space: O(1)
# Time: O(1)
def insert(tree = tree, value = 1, []), do: insert(tree, 1, [])
# STACK FRAME 0.1
# Aux Space: O(n) -- adding tree to options list
# Time: O(1)
defp insert(%Tree{value: 4, left: tree.left, right: tree.right} = tree, new_value = 1, ops = []) do
  cond do
    4 < 1 -> # ...
    true -> insert(tree.left, 1, [{:left, tree} | []]
  end
end
# STACK FRAME 0.2
# Aux Space: theta(n + n/2), O(2n - 1) -- on last call, ops will be theta(n/2 * (n-1)/2) to O(n(n-1)/2)
# Time: O(1)
defp insert(%Tree{value: 2, left: :leaf, right: :leaf} = tree.left, new_value = 1, ops = [left: tree]) do
  cond do
    2 < 1 -> # ...
    true -> insert(:leaf, 1, [{:left, tree.left} | [left: tree]]
  end
end
# STACK FRAME 0.3
# Aux Space: O(1)
# Time: O(1)
defp insert(:leaf, 1, ops = [left: tree.left, left: tree]), do: insert_ops(new(1), ops)
# STACK FRAME 0.4
# Aux Space: from bottom of tree O(1) to top O(n)
# Time: O(1)
defp insert_ops(subtree = %Tree{value: 1, left: :leaf, right: :leaf}, [{side = :left, tree = tree.left} | rest = [left: tree]]), 
  do: insert_ops(Map.put(tree.left, :left, subtree), [left: tree])
# STACK FRAME 0.5
# Aux Space: from bottom of tree O(1) to top O(n)
# Time: O(1)
defp insert_ops(subtree = %Tree{ tree.left | left: new_node}, [{side = :left, tree = tree} | rest = []]), 
  do: insert_ops(Map.put(tree, :left, subtree), [])
# STACK FRAME 0.6
# Aux Space: O(1)
# Time: O(1)
defp insert_ops(tree = tree_with_insertion, []), do: tree_with_insertion

Inserting into a tail recursive BST could take at worst O(n(n-1)/2) auxiliary space if all nodes are along one branch and at worst O(n) time. On average, auxiliary space should be theta(n/2 * (n-1)/2) for a balanced tree and time should be theta(log 2 n).

So both have the same performance, albeit the operations involved in making tail recursive calls cumulatively may take longer–copying complex data structures into each new stack frame doesn’t count as extra auxiliary space but takes longer because less is copied over in each body recursive call. This can account for body recursion being 10% faster (as mentioned in the benchmarking blog post). What accounts for tail recursion not being more space efficient is the need for the options list which effectively is explicitly managing the call stack ourselves.

Personally, I find tail recursion easier to follow because it makes each recursive step explicit (whereas with body recursion, one must reason through what will happen in one’s head). However, it does introduce significantly more complex code when applied to non-primitive and non-collective data structures and comes at a slight performance cost.

That said, if performance were really paramount, the code would be written in C (or nowadays maybe Rust or Haskell).

Personally, my takeaway is to first see if there’s a straightforward body-recursive implementation. If it seems like the body recursion will be particularly complex, I will now use the strategy @peerreynders introduced–explicitly managing the call stack myself–to reason through each step. This will initially result in a tail-recursive implementation, but hopefully doing so will allow me to identify the comparatively simpler body-recursive implementation.

I.e. I’m balancing explicitness with simplicity.