Performance when recombining sublists into a larger list recursively

How can I improve the performance in, for example, a mergesort. In the merge step of the mergesort 2 smaller lists of size n/2 are combined into an list of size n. Because this is happening recursively the next phase when climbing up now has 2 lists of size n being combined into an list of size 2n. This requires multiple copies of lists to be created. Any tips?

If I was using C I could modify the array in memory without creating 2 new arrays each time I recurse down. Would you use ets for this sort of thing in Elixir? Or are there any other mutable data structures?

Because all data structures in immutable functional languages are, per definition, immutable, Elixir does not actually create multiple copies of a given list. Instead,whenever you pass a ‘copy’ somewhere, you’re actually passing a pointer to exactly the same data structure around. And when you modify something, this usually happens in a way that ensures that you can still re-use most of the original data structure.

Lists in Elixir (and most other functional programming languages) are linked lists. This means that they are built up of elements containing a head and a pointer to another list, the tail.

These two statements mean together that when you want to create a new list by prepending an element to an already-existing list, you only need extra memory for this single element, as the tail is 1:1 the same as the old list, and therefore we’re actually just copying the pointer there.

The advantage of this is that it takes less memory, because we re-use parts of the list. Many functional algorithms exploit this nature that linked lists have. Of course, it does mean that element access for an element somewhere in the tail of the list is slower, as to reach the k-th element, we need to iterate over all k-1 elements before it.

In functional programming languages, Merge Sort usually is a lot faster than e.g. QuickSort (which is often the algorithm of choice in imperative languages) because while Quick Sort is about list splitting (which is not fast for linked lists because of the linear element access time), Merge Sort is about list combining, which takes advantage of the facts that the same immutable lists are re-used.

2 Likes

Yes. I understand all of this. The problem is you can’t see where the copies are or are not being made. For example in the combine step mergesort takes 2 lists and combines them into a new list. Is the VM smart enough to reuse the memory in the 2 subarrays instead of allocating memory for an entire new list? Is it smart enough to know under what conditions this can be done? I’ve written an inversion counting routine but it seems inordinately slow and takes 30 seconds to process 100000 elements. Inversion counting uses a similar algorithm to mergesort.

This is the inversion counting routine I wrote:

defmodule Inversion do

  def summarize(a, b, acc, list_acc) do
    case {a, b} do
      {[], []} -> {list_acc, acc}
      {a, []} -> {list_acc ++ a, acc}
      {[], b} -> {list_acc ++ b, acc}
      {a,b} when hd(a) > hd(b) -> summarize(a, tl(b), length(a) + acc, list_acc ++ [hd(b)])
      {a,b} when hd(a) <= hd(b) -> summarize(tl(a), b, acc, list_acc ++ [hd(a)])
    end
  end

  def count_inversion(a) do
    case a do
      [] -> {[], 0}
      [x] -> {[x], 0}
      [x,y] when x > y -> {[y,x], 1}
      [x,y] when x <= y -> {[x,y], 0}
      a -> 
        n = length(a)
        half = div(n, 2)
        {sorted_b, x} = count_inversion(Enum.slice(a, (0 .. half-1)))
        {sorted_c, y} = count_inversion(Enum.slice(a, half..n-1))
        {sorted_a, z} = summarize(sorted_b, sorted_c, 0, [])
        {sorted_a, x + y + z}
    end
  end

  def read(filename) do
    fstream = File.stream!(filename)
    l = Enum.map(fstream, fn(x) -> String.to_integer(String.trim(x)) end)
    {sorted, ans} = count_inversion(l)
    ans
  end
end

How can I improve it’s performance?

The first thing that’s important to note here is that obviously not all algorithms that are appropriate for an imperative context are appropriate in a functional context. I can make a few specific pointers about how what you’re doing here could be improved, but the broader point is that we often need to make changes when trying to port an algorithm into a functional context, particularly when the original algorithm was written to operate on arrays and we’re interested in making it work on linked lists.

The biggest cause for performance issues is that you’re appending to a list repeatedly. This is very bad because it needs to construct a new list every time you append, creating an O(n^2) situation. In general you have/ two choices: use body recursion IE [item | foo(rest)] or accumulate backwards and reverse. This lets you create a constant number of lists instead of N at least for your accumulators.

http://stackoverflow.com/questions/23980893/haskell-performance-inversion-count-algorithm is a post about doing this same algorithm in Haskell, which may serve as a good reference for how to implement it in erlang.

2 Likes

In Elixir, every process has its own dedicated memory area. Only that process has access to that part of memory. Inside this process, all and every data structure that is created, be it integers, floats, atoms, lists, tuples, binaries, maps or structs are referenced to with pointers. Inside a process, two variables that ‘hold’ the same value, are actually pointing to exactly the same location in memory.

This allows the BEAM (the Virtual Machine that Elixir runs on) to do garbage collection by easy reference-counting, which is nice and fast. It also means that tests for equality are just pointer-comparisons, and that pattern-matching can be fast.

So: A copy is never made during normal operations. The only time data is copied, is when a process sends a message to another process.

Because lists are linked lists (and not arrays), it is impossible to re-use them when merging two of them. Some implementations of merge sort such as the one built in in Haskell take advantage of the fact that when you encounter an already-sorted sequence in your list, you can treat this sequence as a terminal element that doesn’t need to be split before merging.

But during the merge step, there indeed is new memory being allocated for each merge. Do note that the elements inside the list are again pointers to some other data location storing the contents, so it does not matter for the memory overhead how large the elements are. And, as soon as these intermediate lists are not used in the functions anymore, they will be garbage collected.

n = length(a) you just walked the whole thing
Enum.slice(a, (0 … half-1) Enum.slice(a, half…n-1) you just walked the whole thing again

linked list is not an array random access is expensive

1 Like

Instead of splitting the list in half in the middle(at div(n,2) elements), the approach most functional implementations of Merge Sort and similar algorithms such as your inversion count take, is to split the list by separating the odd and even elements. With this approach, you don’t need to know the length of the list beforehand, nor do you need to travel through half of the list another time to split it in the middle.

Thanks! A lot of great answers and tips.

1 Like