Why are tuples ordered differently than lists in elixir/erlang?

Enum.sort([[3, -999], [3], [2, 999]])
# => [[2, 999], [3], [3, -999]]
Enum.sort([{3, -999}, {3}, {2, 999}])
# => [{3}, {2, 999}, {3, -999}]

I ask because it came up when I was storing directed graph edges in :gb_sets as {node1, node2}, and I couldn’t just collect all the edges from a node starting with

:gb_sets.iterator_from({node1}, set)

I had to write

:gb_sets.iterator_from({node1, @min_value}, set)

instead.

1 Like

I can not tell you why the decission was made like this, but:

Lists are compared element by element. Tuples are ordered by size, two tuples with the same size are compared element by element.

Emphasizis mine.

7 Likes

Tuples and maps can get their size computed in constant time, so it is cheap to use that for sorting, that’s not the case for lists! And the sooner you can tell two elements are different, for example by checking their size, the faster sorting will be.

15 Likes

Aha, the performance explanation makes sense, thanks @josevalim.
I think for lists, even in a crazy world where the implementation was modified to get length in constant time, the existing ordering would be needed, at least as an option, to preserve lexicographic ordering for charlists, among other applications.

1 Like

Lexicographic sort behavior for tuples is very useful for building database indexes with nested terms.

On the other hand, I have yet to think of a good use for tuples sorted by size. The performance benefit doesn’t really matter if the behavior is useless. It’s actually worse, because in order to get the useful behavior you have to use lists, which will thrash the cache lines more than (contiguous) tuples.

IMO the tuple sort behavior is a design mistake. If for some reason you actually wanted to sort by size it would be trivial to prefix the tuples ({3, a, b, c}), but it’s much harder to go the other way.

1 Like

If you are not going to sort tuples by size how would you order the 2 tuples {x,39,d,z} and {x,39,d}? Yes, you can compare the elements but what do you do when one tuple “runs out” of elements? Does the smaller tuple then come first, that is no element is always smaller than any element? Or what?

The current method is simple and consistent.

4 Likes

Much of the usefulness of the term order is derived from the fact that it exists, not necessarily the internal semantics of the ordering (e.g. :lists.usort/1).

I appreciate the existing design because as a user I expect operations on tuples to be O(1) where possible.

If you say (a) a term ordering must exist, (b) tuple operations should be fast, and (c) list operations may be comparatively slower, then comparing the size of the tuples as a short circuit is the correct design. I believe this leads to fewer surprises to the user overall.

2 Likes

The smaller tuple comes first.

1 Like

But if the tuples are the same size, the comparison is O(n) anyway. Is this not the common case? How often are you sorting tuples of different sizes? If anything short-circuiting is less predictable.

Like, with respect, because the list of people I’m disagreeing with here is pretty ridiculous, I find this argument kinda specious.

By this logic why not compare binaries by size first? You know why: it would be confusing and annoying. And with tuples it is also confusing and annoying, for the same reasons.

3 Likes

Yikes, if sorting tuple is O(n) then you are indeed right, I have yet to see anyplace where it is common to sort different size tuples, in fact it would not be a good idea to use tuples in generel if they are different sizes, as tuples are expensive to create in the first place.

The only place I could imagine the use would be in mnesia, ets and dets. But that feels like a red herring as you tend to use records for a table where the size is fixed and known at compile time, so in the end this only leaves the compiler, but I’ve even bigger doubts that a list of tuples are sorted there.

There could be an argument for persistent term, but that an extremely weak argument, an properly so rarely used that it you give you any measurable differences.

1 Like

If you are ok with a contrived example, I could define:

-type vec2() :: {float(), float()}.
-type vec3() :: {float(), float(), float()}.

And have the useful property that vec2() < vec3(). What kind of program needs this? Not sure, but it’s a useful property nonetheless. I don’t think that lexicographic ordering is always superior.

2 Likes

Lexicographic order is more expressive. If you wanted to model your vec2 < vec3 then you would do this:

@type vec2 :: {2, float(), float()}
@type vec3 :: {3, float(), float(), float()}

And they would sort properly (with the same performance characteristics!).

Going the other way would necessitate padding all tuples to a particular size, but that implies you actually know the size of the largest tuple in advance, which is unreasonable.

1 Like

That was my point!

Yes, that is what Erlang used to do a long time ago. Someone in the HiPE team at Uppsala university pointed out that it would be much more useful to compare binaries in the same way as lists, so we silently changed that in next major release.

Yes, I agree. Tuples is usually used for holding a collection of values of a fixed size. For many use cases, the exact order in which tuples are ordered is not important. Therefore, I think the design decision to sort tuples by size first is the correct one.

1 Like