Ex_cldr - Common Locale Data Repository (CLDR) functions for Elixir (Version 2 out now!)

localization
internationalization
ex_cldr
#1

ex_cldr provides localisation and internationalisation support based upon the data from the Unicode CLDR project.

Unicode released CLDR version 34 this week and ex_cldr is now updated to reflect that data which now consists of 537 locales that can be used in Elixir.

The full list of updated packages (core and optional) is:

Also updated is ex_money since it uses ex_cldr and friends under the cover for localisation and formatting:

This is expected to be the last functional release of ex_cldr version 1.x with the release of the 2.0 by the end of this year. Bugs in 1.x will of course continue to be eradicated as quickly as possible.

10 Likes

Localize date/time/datetime format according to local rules
Ex_cldr version 2.0 out (the :cldr compiler is finally gone!)
#2

Cldr version 2.0 has just been released on hex. Its a major version bump with breaking changes - primarily to restructure the code in the manner of Ecto, Phoenix and Gettext by requiring the provision of a <backend> module into which most of the public API that hosts the CLDR content is generated. It gets rid of the horrid :ex_cldr compiler.

ex_cldr provides the underlying data that powers number, date time, list, units and territory formatting in over 500 different locals. Its also used to underpin ex_money.

The changelog contains the changes. Several of the dependent packages are also updated:

Some packages will be updated over the next two weeks:

6 Likes

#3

Hey Kip - thanks for this library!

I’m trying to determine whether I should use CLDR or AINA, or both, to specify languages in my app.

Your library/CLDR follows RFC 5646.
W3 also recommends RFC 5646 -
https://www.w3.org/International/articles/language-tags/ … but also refers to https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry as the source of truth for RFC 5646.

However, some language codes in iana’s registry are not available in Cldr’s registry:

iex(27)> Enum.member?(MyApp.Cldr.Language.all_languages(), "adq")
false

CLDR still parses it and gets the language key filled out, though:

iex(28)> Cldr.Locale.new("adq", Onward.Cldr)
{:ok,
 %Cldr.LanguageTag{
   canonical_locale_name: "adq-Latn-US",
   cldr_locale_name: nil,
   extensions: %{},
   gettext_locale_name: nil,
   language: "adq",
   language_subtags: [],
   language_variant: nil,
   locale: %{},
   private_use: [],
   rbnf_locale_name: nil,
   requested_locale_name: "adq",
   script: "Latn",
   territory: "US",
   transform: %{}
 }}

so that’s great. But that language isn’t found:

iex(29)> MyApp.Cldr.Number.to_string 12345, locale: "adq"
{:error, {Cldr.UnknownLocaleError, "The locale \"adq\" is not known."}}

Do you know why does CLDR not have a language code that AINA has, given they’re following the same spec?

I think it’s because the spec is bcp47, which they both follow… But I’m not sure how AINA has a higher quantity of language tags, where CLDR has language-REGION tag combos which are not specified IANA’s registry.

I’m ultimately trying to ensure I’m using languages properly as my organization has a massive amount of languages.

I’m wondering about storing AINA’s codes separately from CLDR, and using your library to augment AINA’s codes where possible with the wealth of extra data your lib provides. Just was interested in your thoughts on AINA vs. CLDR if you had any, and if you think there’s room to use both.

Thanks!

0 Likes

#4

All but one of the companion libs is now updated. Cldr.DatesTimes is being actively updated to reflect the new Calendar functions in Elixir 1.8 and will be out by the end of January.

3 Likes

#5

@gdub01, thanks for your interest. The difference is primarily that CLDR is not a registry but a data repository. (Common Locale Data Repository). Whilst ex_cldr will parse adq as a valid language tag, CLDR doesn’t have any translation data so therefore you see that the cldr_locale field in the struct is empty.

There are 533 languages supported in the current CLDR version 34 repository.

If your primary objective is to detect valid language tags then I think the two choices are:

  1. Use IANA data alone
  2. Use CLDR in conjunction with IANA data. CLDR will detect obsolete tags and update them to the modern version, apply default sub tags where known and also apply known aliases. This may (or may not) be useful for you in having as lenient a parse as possible.

If your primary objective is application localisation then I think CLDR is the most comprehensive repository available and it underpins most of the application domain globally - often through the libs icu4c and icu4j.

ex_cldr is an elixir implementation that largely matches the functionality of icu4j for output but does not implement parsing.

2 Likes

#6

Thanks so much @kip ! I think I’ll go with CLDR & AINA. Appreciate it.

0 Likes

#7

A few long flights have given an opportunity to make some updates to the ex_cldr set of libraries:

  • ex_cldr_print provides C-compatible printf/3 and sprint/3 functions for formatting strings. Since its built on CLDR data, it includes localising for grouping characters, decimal points and exponent characters. It also means you can output in different digit systems (like thai, arab and so on). So far only on GitHub, it needs further tuning and some development before a hex.pm release.

  • ex_cldr_collation which implements locale-specific collations (sorting). NIF-based, it currently only supports the default CLDR collation. This lib is based upon the erlang library erlang-ucol. Next step is to support the full range of collations for configured locales.

2 Likes

#8

CLDR version 35 was release on March 27th and is now incorporated into updates to the cldr_* family on hex. CLDR supports localisation of number, dates, times, lists, units for 540 locales. It supports multiple calendars (coming soon in ex_cldr_calendars) as well.

Summary of CLDR 35.0.0 update

Data 70,000+ new data fields, 13,400+ revised data fields
Basic coverage New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverage Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern
Emoji 12.0 Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords
Collation Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation
Measurement units 23 additional units
Date formats Two additional flexible formats, and 20 new interval formats
Japanese calendar In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”.
Region Names Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ).
Segmentation Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva.

Related Cldr releases on Hex

Migration from earlier versions of Cldr

No code changes are expected for client applications however since CLDR is a data repository, underlying data may have changed.

1 Like

#9

I have pushed to hex a new member of the ex_cldr_* family of packages: ex_cldr_calendars.

From the readme:

Cldr Calendars builds on Elixir’s standard Calendar module to provide additional calendars and calendar functionality intended to be of practical use. In particular Cdlr Calendars:

  • Provides support for configurable month-based and week-based calendars that are in common use as “Fiscal Year” calendars for countries and organizations around the world. See Cldr.Calendar.new/3

  • Supports localisation of common calendar terms such as “day of the week” and “month of the year” using the CLDR data that is available for over 500 locales. See Cldr.Calendar.localize/3

  • Supports locale-specific knowledge of what is a weekend or a workday. See Cldr.Calendar.weekend/1, Cldr.Calendar.weekend?/2, Cldr.Calendar.weekdays/1 and Cldr.Calendar.weekday?/2.

  • Provides convenient Date.Range calculators for years, quarters, months and weeks for calendars and provides the means to move to the next and previous period in a calendar where a period may be a year, quarter, month, week or day.

  • Supports adding or substracting periods to dates and date ranges. See Calendar.plus/3 and Calendar.minus/3

  • Includes pre-defined calendars for Gregorian (compatible with the builtin Calendar module), ISOWeek and National Retail Federation (NRF) calendars

  • Includes functions to find the first, last, nearest and nth days of the week from a date. For example, find the 2nd Tuesday in November.

2 Likes