unicode_transform has reached the 1.0 milestone after a complete rewrite delivering a library which transliterates text between scripts, applies normalization and case mappings, and executes arbitrary CLDR transform rule sets at runtime.
An opt-in NIF is included in the library for high performance transformations (see the performance section of the readme for details - the NIF is not always the fastest transformer). A fast-path Latin to ASCII module is included that is faster than the NIF (and it’s used automatically).
unicode_transform ships with all 394 CLDR transforms covering script conversions (Greek, Cyrillic, Arabic, Devanagari, Thai, Hangul, and many more), Indic cross-script transliterations, BGN/PCGN romanizations, and specialized transforms like Any-Publishing and Fullwidth-Halfwidth.
Examples
Here are some examples of what unicode_transform. The primary public API is Unicode.Transform.transform/2.
Script-to-Latin transliteration
Convert text from non-Latin scripts to Latin characters:
# Greek to Latin
iex> Unicode.Transform.transform("Ελληνικά", from: :greek, to: :latin)
{:ok, "Ellēniká"}
# Cyrillic to Latin
iex> Unicode.Transform.transform("Москва", from: :cyrillic, to: :latin)
{:ok, "Moskva"}
# Korean to Latin
iex> Unicode.Transform.transform("한글", from: :hangul, to: :latin)
{:ok, "hangeul"}
# Thai to Latin
iex> Unicode.Transform.transform("กรุงเทพ", from: :thai, to: :latin)
{:ok, "krungtheph"}
# Arabic to Latin
iex> Unicode.Transform.transform("عربي", from: :arabic, to: :latin)
{:ok, "ʿrby"}
Latin-ASCII (accent stripping)
Remove diacritics and convert to plain ASCII:
iex> Unicode.Transform.transform("Ä Ö Ü ß", from: :latin, to: :ascii)
{:ok, "A O U ss"}
iex> Unicode.Transform.transform("café résumé", from: :latin, to: :ascii)
{:ok, "cafe resume"}
German-specific ASCII transliteration
Uses context-sensitive rules (e.g., uppercase Ä becomes AE, lowercase ä becomes ae):
iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
{:ok, "AE oe ue"}
iex> Unicode.Transform.transform("Ä ö ü", from: :de, to: :ASCII)
{:ok, "AE oe ue"}
iex> Unicode.Transform.transform("Ä ö ü", from: "de", to: "ASCII")
{:ok, "AE oe ue"}
Cross-script Indic transliteration
Convert between Indic scripts without going through Latin:
iex> Unicode.Transform.transform("हिन्दी", from: :devanagari, to: :bengali)
{:ok, "হিন্দী"}
iex> Unicode.Transform.transform("বাংলা", from: :bengali, to: :gujarati)
{:ok, "બাંলা"}
Japanese script conversion
iex> Unicode.Transform.transform("あいうえお", from: :hiragana, to: :katakana)
{:ok, "アイウエオ"}
# Options accept strings too (case-insensitive)
iex> Unicode.Transform.transform("あいうえお", from: "Hiragana", to: "Katakana")
{:ok, "アイウエオ"}
iex> Unicode.Transform.transform("tokyo", from: :latin, to: :katakana)
{:ok, "トキョ"}
Normalization and case transforms
Built-in transforms for Unicode normalization forms and case mapping:
iex> Unicode.Transform.transform("hello world", to: :upper)
{:ok, "HELLO WORLD"}
iex> Unicode.Transform.transform("hello world", to: :title)
{:ok, "Hello World"}
iex> Unicode.Transform.transform("A\u0308", to: :nfc)
{:ok, "Ä"}
Migration
If you’re using unicode_transform versions before 1.0.0, the API has changed - but not dramatically. However you will need to make some modifications to use the updated Unicode.Transform.transform/2 function.
Implementation notes
The implementation was very strongly supported by using Claude. I think this kind of project really fits in well with using an LLM to support development:
- The specification is well-written and complete so the LLM can readily derive a specification from it.
- There is a reference implementation in ICU. Therefore the implementation can be tested against a reference implementation. Having the NIF interface to ICU definitely helps speed development and testing.






















