Unstable sort

In Programming Elixir 1.3, under the topic of Enum, the author mentioned that “It’s important to use <= and not just < if you want the sort to be stable.”

Does anyone know what this means? Does it mean if i do

Enum.sort [“there”, “was”, “a”, “crooked”, “man”], &(String.length(&1)<String.length(&2))

I might get different results on different runs? If so, why?

You won’t notice the difference when sorting strings.

Also stability on sorting doesn’t mean that you sort a different way watch time, but it means that equal elements will have their original order.

When using elixirs native comparing functions/macros you will never see any difference. The difference will only come into play if you use a custom compare function which only compares on a subset of keys in a compound data type.

There is some info about it in the docs https://hexdocs.pm/elixir/Enum.html#sort/2

It plays a role if you sort data based on one or some of its properties (and not all), which means that there are different data structures that can be considered ‘equal’ but not ‘the same’. An example would be when we sort a deck of cards based on numerical value. Clearly, 5 of hearts and 5 of spades are ‘equal’ during sorting, but not ‘the same’.

Exactly this also happens when we e.g. sort posts by author, any post by ‘John’ would be ‘equal’, even though they have different contents.

So now we come to stability: If a sort is not stable, then if some elements were already in a desired (relative) order before, then this is now lost. This mostly plays a role if we want to sort by multiple fields after another.

Say we have this example data:

posts = [
  %{author_id: 1, created_at: 1, content: "Test 1"},
  %{author_id: 2, created_at: 15, content: "Test 2" },
  %{author_id: 2, created_at: 12, content: "Test 3" },
  %{author_id: 2, created_at: 10, content: "Test 4" },
  %{author_id: 1, created_at: 5, content: "Test 5"},
]

Now, lets sort them:

iex> posts \
iex> |> Enum.sort(fn a, b -> a.created_at <= b.created_at end) \
iex> |> Enum.sort(fn a, b -> a.author_id <= b.author_id end)
# The posts are now sorted on author, and where the author is the same, sorted on create_time.
[%{author_id: 1, content: "Test 1", created_at: 1},
 %{author_id: 1, content: "Test 5", created_at: 5},
 %{author_id: 2, content: "Test 4", created_at: 10},
 %{author_id: 2, content: "Test 3", created_at: 12},
 %{author_id: 2, content: "Test 2", created_at: 15}]

Note that if we were to change the second sorting operation to use < instead of <=, it would undo the work we had done during the first sorting operation:

iex> posts \
iex> |> Enum.sort(fn a, b -> a.created_at <= b.created_at end) \
iex> |> Enum.sort(fn a, b -> a.author_id < b.author_id end) # Unstable!
[%{author_id: 1, content: "Test 5", created_at: 5},
 %{author_id: 1, content: "Test 1", created_at: 1},
 %{author_id: 2, content: "Test 2", created_at: 15},
 %{author_id: 2, content: "Test 3", created_at: 12},
 %{author_id: 2, content: "Test 4", created_at: 10}]

Note that the output now is wrong; the created_at information is not in order.

This learns us that when doing multiple sorts after one another, the first one does not need to be stable (because it cannot ‘destroy’ relative ordering from earlier sorting stages) but all later ones should be.

In practice, there are some sorting algorithms that are easier (more efficient) to implement without caring about stability. This means that stability is something you might need to keep in mind. Enum.sort, however, uses merge sort, which is a stable algorithm (as long as the function used for comparisons is <= or => and not < or >, that is, it notes that equal elements are already in the correct position from one another).

Wikipedia also has a very clear explanation of sorting stability.

2 Likes

You guys are awesome, thanks for taking time to help me out with this

1 Like

If you ever want to look at the maths behind it, it is the ideas of partial order, strict partial order or total order.