Announcing ex_cldr_collation version 1.0
ex_cldr_collation is an Elixir implementation of the Unicode Collation Algorithm (UCA) as extended by CLDR, providing language-aware string sorting and comparison. An opt-in NIF is provided for high performance collating.
ex_cldr_collation has no dependency on ex_cldr but if it is configured in your app, Cldr.LanguageTag.t locales can be passed as a :locale option.
Features
-
Full Unicode Collation Algorithm implementation in pure Elixir.
-
CLDR root collation based on the Unicode DUCET table.
-
Locale-specific tailoring for 10+ languages (Danish, German phonebook, Spanish, Swedish, Finnish, etc.)
-
All BCP47
-u-extension collation keys supported. -
Optional high-performance NIF backend using ICU4C.
-
Sort key generation for efficient repeated comparisons.
Examples
There are lots of options to affect locale-specific and user-specific sort requirements. Here are just some basic examples:
iex> Cldr.Collation.sort(["café", "cafe", "Cafe"])
["cafe", "Cafe", "café"]
# Cased comparisons
iex> Cldr.Collation.sort(["café", "cafe", "Cafe"], case_first: :upper)
["Cafe", "cafe", "café"]
iex> Cldr.Collation.compare("café", "cafe")
:gt
iex> Cldr.Collation.compare("a", "A", casing: :insensitive)
:eq
# Numeric ordering. Note that the normal order places
# the 1 before the 2
iex> Cldr.Collation.sort(["Level 10", "Level 2"], numeric: true)
["Level 2", "Level 10"]
# But numeric sorting takes consecutive digits into account,
# and not just Indo-arabic digits - any digits in any script.
iex> Cldr.Collation.sort(["Level 10", "Level 2"], numeric: false)
["Level 10", "Level 2"]
# German phonebook ordering
iex> words = ["Ärger", "Alter", "Ofen", "Öl", "Über", "Ulm"]
iex> Cldr.Collation.sort(words)
["Alter", "Ärger", "Ofen", "Öl", "Über", "Ulm"]
iex> Cldr.Collation.sort(words, locale: "de-u-co-phonebk")
["Ärger", "Alter", "Öl", "Ofen", "Über", "Ulm"]
# Locale-based ordering
iex> Cldr.Collation.compare("a", "A", locale: "en-u-ks-level2")
:eq
# Sort key generation
iex> Cldr.Collation.sort_key("hello")
<<36, 196, 36, 83, 37, 40, 37, 40, 37, 152, 0, 0, 0, 32, 0, 32, 0, 32, 0, 32, 0,
32, 0, 0, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2>>
Implementation notes
Like unicode_transform, Claude was a very valuable co-developer for this release. The same value proposition makes this a very powerful combination for development and testing:
- The specification is clear and complete so easy to ingest for an LLM
- There is a reference implementation against which test validation can run automatically. The NIF-based interface to ICU makes this almost trivial.






















