String compare depends on upper/lower case?

I am wondering why I have this behavior, that seems counter intuitive to me:

iex(1)> "x" > "a"
iex(2)> "X" > "a"

When I sort a list of strings, I don’t expect this result

iex(1)> Enum.sort(["b", "a", "X"])
["X", "a", "b"]

I didn’t find in the documentation where this behavior is explained. If somebody knows, I’m interested :slight_smile:

Bitstrings are compared byte by byte, incomplete bytes are compared bit by bit.

Uppercase letters use smaller byte values than lowercase ones.



How do you easily sort by alphabetical order then?

Just off the cuff… maybe something like this.

iex(4)> Enum.sort(["b", "X", "a"], &(String.downcase(&1) <= String.downcase(&2)))
["a", "b", "X"]

There could be some string handling caveats that I’m not thinking about, but it’s the general idea.


JavaScript has the same behavior as well:

["a", "b", "X"].sort();          // ["X", "a", "b"]

Internally the Ascii values are being compared, which you can check in iex:

iex(15)> 'X'
iex(16)> 'a'
iex(17)> 'x'

So, always convert to lowercase before comparing, to avoid running into edge cases.

For instance, see UpperCase win:

iex(22)> ["derpycoder", "Derpycoder", "DerpyCoder"] |> Enum.sort()
["DerpyCoder", "Derpycoder", "derpycoder"]

See the answer by @sbuttgereit.

1 Like

And also be aware that if you are sorting non-ASCII strings then you should also normalise the string first. For example String.downcase(string) |> String.normalize(:nfkd).

Lastly, collation rules are language and culture dependent even for the same strings so depending on what you’re trying to do this is a much more complex topic than it seems on the surface.


Very interesting, thanks for all those details. Much more complex topic than what I expected indeed!