Ex_cldr - Common Locale Data Repository (CLDR) functions for Elixir

Good points as always. Gettext strategy it is.

Do you have the link to the article you mentioned?

1 Like

Is ā€œrealā€ text required or will it also work if one supplies a ā€œcontextā€, e.g. Dashboard.Headline and the real text will be loaded from translation files or DB? I actually donā€™t like the gettext approach.

Can you give me an example of what you are suggesting? I think youā€™re suggesting using a ā€œstring keyā€ rather than a string itself so that all strings, including the base language example, are stored externally. But Iā€™d like to be sure I understand correctly.

For storage - Iā€™m making that pluggable, with at least one backend that can pull (and be pushed) translations from the net.

Exactly this. I donā€™t like to have one language ā€œhardcodedā€ in the source. I am also not sure how it will work if one changes the ā€œhardcodedā€ text to e.g. fix a typo or add additional details. It seems the connection to the translated versions get lost.

But I may also misunderstand this approach. I havenā€™t used it yet because of the reasons outlined above.

All good points.

The positive side to having the base string embedded in code is the readability and comprehension of the text. But I recognise your perspective as well.

I am definitely supporting the same approach as Gettext. In a similar way, I am basically converting each string into a ā€œcompiledā€ format during module compilation to speed the runtime. At the same time, I am ā€œuncompilingā€ it to derive a canonical form of the message that I will then hash as they key to translations. At compilation, like gettext, I will do a fuzzy comparison of strings in case there is a minor change like capitalisation. And below a configurable threshold treat the keys they same. This is still a work in progress,

I donā€™t think its a difficult at all to use an atom as the argument to the messaging functions, rather than a binary. And simply treat that as the key. And I donā€™t think it compromising the integrity of the overall strategy I currently have in my head.

Comments and suggestions welcome.

1 Like

Even in gettext one could just use string keys instead of proper language and translate that. Itā€™s just that gettext tools often treat the key as the ā€œsource languageā€ instead of using the translation for a different language as base.

I didnā€™t mean Dashboard.Headline as an atom, tho. I would make it a string "Dashboard.Headline". Atoms still donā€™t grow on trees :smiley:

Ah, ok, got it. In which case, as @LostKobrakai says, thatā€™s possible with Gettext today as well.

1 Like

ex_cldr_messges 0.3.0 now has a macro to parse the format at compile-time. Now itā€™s on to translation scaffolding.

Unlike Gettext while inlines the translations into the backend module, I plan to keep the translations separate so that they can be updated without stopping the application and without having to re-deploy. The storage module will be pluggable on a per-cldr-backend basis.

I am building two storage engines as part of the package:

  1. :persistent_term based, with maybe an :ets fallback for earlier OTP releases. Iā€™ve seen some excellent performance with :persistent_term on an NLP-related project so it seems a good tool for this.

  2. A web-service based storage engine. Havenā€™t quite worked out how thatā€™s going to work, but I think itā€™s the better way to go for production apps. This engine would build on top of the first engine. I will build the web-service too as open source. And I may even build a paid service as well.

Comments and suggestions always welcome.

4 Likes

Iā€™m quite happy with my Mezzofanti library as a frontend for you ICU message format library. I can continue working on that if your ICU library is ready.

The parser is complete and messaging formatting is also complete - both function and macro form.

Definitely interested in seeing if Mezzofanti and cldr_messages could work together. Iā€™ll ping you a DM.

An update to ex_cldr_calendars 1.2.0 today to kickoff the weekend. It adds durations. A Duration is the difference between two dates, times and datetimes expressed in calendar time units. Of course Calendar.ISO is supported, but so are any calendars defined by or defined with Cldr.Calendar.new/3.

Then these durations can be localised to humanise the duration of time. Some examples:

# How long until the Tokyo Olympic Games start?
iex> {:ok, d} = Cldr.Calendar.Duration.new(~D[2019-08-31], ~D[2020-07-24])
{:ok,
 %Cldr.Calendar.Duration{
   day: 24,
   hour: 0,
   microsecond: 0,
   minute: 0,
   month: 10,
   second: 0,
   year: 0
 }}
iex> Cldr.Calendar.Duration.to_string d       
{:ok, "10 months and 24 days"}
iex> Cldr.Calendar.Duration.to_string d, locale: "ar"                       
{:ok, "10 Ų£Ų“Ł‡Ų± Łˆ24 ŁŠŁˆŁ…Ł‹Ų§"}
iex> Cldr.Calendar.Duration.to_string d, locale: "he"                       
{:ok, "10 חודשים ו24 ימים"}
iex> Cldr.Calendar.Duration.to_string d, style: :narrow                     
{:ok, "10m and 24d"}
iex> Cldr.Calendar.Duration.to_string d, style: :narrow, list_options: [style: :unit_narrow]
{:ok, "10m 24d"}

All the localisation is driven, of course, by CLDR data encapsulated in ex_cldr.

3 Likes

ex_cldr 2.11 release supporting CLDR 36.0.0

This week the Unicode Consortium released CLDR version 36 upon which the hex package ex_cldr is based.

The CLDR release notes explain in detail the changes underlying the most comprehensive source of internationalisation and localisation data in the world, used by every major vendor.

Data changes that may affect ex_cldr formatting

  • zh: The currency symbol for CNY changed from fullwidth ļæ„(FFE5) to halfwidth Ā„ (00A5)
  • fr_CA : Switched to full year (not 2-digit year) in short date formats. [CLDR-11666]
  • bg: Removed ā€œŃ‡.ā€ from time formats. [CLDR-11545]
  • The translations for the new name ā€˜North Macedoniaā€™ has been refined for many languages by contributors, and those languages with no contributors have been reverted to code ā€˜MKā€™. All Alt values also have been removed [CLDR-13099].

Release note highlights

  • Approximately 32K items added
    • Significant increase (approx 50% or more) in moderate and/or modern coverage for: ceb (Cebuano), ha (Hausa / Latin script), ig (Igbo), kok (Konkani), qu (Quechua), to (Tongan), yo (Yoruba). Additionally, the following locales had at least a 15% increase in basic coverage: az (Azerbaijani / Latin script), so (Somali / Latin script).
    • Seed data for new locales, including three native languages of N. America: cic (Chickasaw), mus (Muscogee), osa (Osage, Osage script); an (Aragonese), su (Sundanese, Latin script), szl (Silesian).
    • Additional data for new items listed below.
  • Emoji
    • Added names and keywords for Emoji 13.0 draft candidates; these are to be fleshed out further in v36.1.
    • Refined names and keywords for Emoji 12.0, including for English.
  • Measurement units:
    • Additional compoundUnitPattern ({0}ā‹…{1} in root) for expressing units like newton-meter (Nā‹…m)
    • Additional units: dot-per-centimeter, dot-per-inch, em, megapixel, pixel, pixel-per-centimeter, pixel-per-inch; decade; therm-us; bar, pascal
  • Locale identifiers and names
    • Extended Language Matching to have fallbacks for many encompassed languages. [CLDR-13244]
    • Added more languageAliases from the BCP47 language subtag registry, for deprecated languages.
    • New alt=ā€œmenuā€ names for certain languages, intended to provide better sorting in menus. [CLDR-11834]
    • Updated validity and collection information for geographic subregions; updated names especially for subregions of UK and Sweden.
    • Names have been added for ā€œpseudo-regionsā€ XA (Pseudo-Accents) and XB (Pseudo-Bidi). These are only intended for testing purposes, you may need to add special handling to remove them for production purposes. [CLDR-13100]
  • Other

There are no changes to the public API of any of the ex_cldr_* packages. Updates are only to encapsulate the new data.

Upgrading

Executing mix deps.update ex_cldr should be enough to get you on the latest version which is 2.11.0.

3 Likes

Since the new astro library is now out I was able to finish up the Persian (Solar Hijri) calendar which is now available on hex.

This is a calendar that starts the year on the March equinox (or a day later if the equinox is equal to or later than true solar noon on the day of equinox in Iran). The fact that it is observation based required the development first of a library to do the astro calculations - which turned out to be good fun (albeit there is much more to do to support more complex lunisolar calendars like the Islamic and Hebrew calendars).

Localisation is supported via ex_cldr_calendars which is a dependency. And full date/time formatting is available via ex_cldr_dates_times which is not a dependency.

Since Iā€™m back temporarily on calendars Iā€™ll finish up the Coptic Calendar and the Ethiopic Calendar since they are quite straight forward.

1 Like

The Coptic and Ethiopic calendars are now also published on hex. End of calendars for a while ā€¦ back to unicode sets and rules for word/sentence breaks and transformations ā€¦

I just want to thank you for all the time you are spending on this :slightly_smiling_face:

1 Like

With the impending release of Elixir 1.10, several of the CLDR libs have been updated to be compatible without compiler warnings or to leverage new capabilities. Feedback on any issues would be most welcome. A simple mix deps.update <lib> will be enough to move to the latest version which is also backwards compatible for earlier Elixir versions (typically to Elixir 1.8).

The new releases are:

  • ex_cldr_calendars 1.7.0 which uses the new implementation of the Inspect protocol for calendar types to allow the inspecting and parsing of Sigil_D, Sigil_N and Sigil_U for any valid calendar. For example:
iex> ~D[2020-W01-1 Cldr.Calendar.ISOWeek] 
~D[2020-W01-1 Cldr.Calendar.ISOWeek]

iex> ~D[2020-01-02 Cldr.Calendar.Gregorian]
~D[2020-01-02 Cldr.Calendar.Gregorian]
  • ex_cldr 2.12.0 which removes calls to the deprecated Code.ensure_compiled?/1

  • ex_money 4.4.2 which also removes calls to Code.ensure_compiled?/1. It now has a dependency on ex_cldr of a minimum of 2.12.0.

The other CLDR packages do not require any updating to support Elixir 1.10,

1 Like

With Elixir 1.10 now out its time to release ex_cldr_units 2.8.0 that supports the new Elixir Enum/sort/2 by introducing Cldr.Unit.compare/2. Any two units that can be converted to each other can be compared.

Examples

iex> alias Cldr.Unit                                                                             
Cldr.Unit

iex> unit_list = [Unit.new(:millimeter, 100), Unit.new(:centimeter, 100), Unit.new(:meter, 100), Unit.new(:kilometer, 100)]
[#Unit<:millimeter, 100>, #Unit<:centimeter, 100>, #Unit<:meter, 100>,
 #Unit<:kilometer, 100>]

iex> Enum.sort unit_list, Cldr.Unit
[#Unit<:millimeter, 100>, #Unit<:centimeter, 100>, #Unit<:meter, 100>,
 #Unit<:kilometer, 100>]

iex> Enum.sort unit_list, {:desc, Cldr.Unit}
[#Unit<:kilometer, 100>, #Unit<:meter, 100>, #Unit<:centimeter, 100>,
 #Unit<:millimeter, 100>]

iex> Enum.sort unit_list, {:asc, Cldr.Unit}
[#Unit<:millimeter, 100>, #Unit<:centimeter, 100>, #Unit<:meter, 100>,
 #Unit<:kilometer, 100>]
1 Like

Updates released to hex for most of elixir-cldr libraries based on CLDR.

Summary of updates

  1. Ensure certificate verification when downloading locales. This should automatically detect the certificate trust store on most platforms (except Windows). Therefore in most cases no configuration is required. If you think it should detect the trust store on your platform but it doesnā€™t, please raise an issue. It will also detect that either castore or certifi is installed and use the packageā€™s trust store in preference to the platform trust store.

  2. Updates the data to CLDR version 37. Adds about 20 new locales for a total of 566. Can you say ff-Adlam-CM? Itā€™s the Fula language with the Adlam script as spoken in Cameroon.

  3. Significant updates to the Unit of Measure data which is supported by ex_cldr_units version 3.0.0. The units engine now supports compound units and algebraic units. square ampere per yoktolumen anyone? The units do not have to be obviously meaningful - just algebraically resolvable. Built in support for SI prefixes and square- and cubic- prefixes. The output formatting via Cldr.Unit.to_string/3 needs some more work for complex units but is otherwise sound.

  • Adds the Cldr.Chars protocol which is implemented for numbers, dates, times, units, money to continue the journey towards low-friction localisation of applications

  • Adds much better support for the BCP47 U extension which continues the work towards having the language tag become a preferred way to express user intent in localised applications. For example, this allows a user to specify default currency, default number system, default calendar and so on.

Release versions and links

4 Likes