Dealing with inflection in Gettext

hauleth · February 2, 2020, 12:05pm

How do you deal with inflections in translations when working with Gettext? I am working on my side project where I want to introduce proper translations, and while “core” will be written in English I also want to add Polish (which is my mother tongue) and I see some issues with inflections, as Gettext do not support it at all. This can be problematic as the slavic languages relies on that a lot, and for example noun genders forces how to construct sentences, not speaking about grammatical cases or other stuff which makes it a lot harder.

How did you dealt with such cases? Because due to lack of such features I am currently quite keen into implementing Fluent Elixir, but it seems like tremendous work, and I would prefer to focus on application if there is already done framework that support such features.

kip · February 2, 2020, 12:19pm

You can’t really deal with it Gettext. Its a definitional problem, not an implementation one.

The Unicde message format does a better job but its certainly not complete. I have an implementation of it in ex_cldr_messages that is API complete but lacks the message store and translation part. If it appears to reasonably suit your needs you can start to develop your side project and I’ll hurry up the message store and translation part in parallel. Its high on my to-do list anyway and I just need the push…

An example:

Cldr.Message.format(
    "{gender_of_host, select,
      female {
        {num_guests, plural, offset: 1
          =0 {{host} does not give a party.}
          =1 {{host} invites {guest} to her party.}
          =2 {{host} invites {guest} and one other person to her party.}
          other {{host} invites {guest} and # other people to her party.}}}
      male {
        {num_guests, plural, offset: 1
          =0 {{host} does not give a party.}
          =1 {{host} invites {guest} to his party.}
          =2 {{host} invites {guest} and one other person to his party.}
          other {{host} invites {guest} and # other people to his party.}}}
      other {
        {num_guests, plural, offset: 1
          =0 {{host} does not give a party.}
          =1 {{host} invites {guest} to their party.}
          =2 {{host} invites {guest} and one other person to their party.}
          other {{host} invites {guest} and # other people to their party.}}}}",
  gender_of_host: "male",
  host: "kip",
  guest: "jim",
  num_guests: 1
)

If you follow this approach then you get immediate integration with the rest of ex_cldr to format money, numbers, dates, calendars, units and lists in a locale-specific fashion as well.

There is a new Unicode Working Group working on message formatting recognising the limitation on all the current approaches. I am participating in. There is a wealth of great content and discussion and you’d be welcome to observe or join. But its early days. Currently it looks like i18next and Fluent are providing the most inspiration.

kip · February 3, 2020, 3:35am

OK, I’ll take up the challenge and do a Fluent implementation for Elixir. I can reuse a lot of code from ex_cldr_messages. They are structurally quite similar and both work from CLDR data.

kip · March 5, 2020, 1:27am

Progress on ex_cldr_fluent is moving forward, but a bit slower than I’d like due to competing priorities (there’s some very cool stuff coming out with CLDR 37 related to units, unit conversion and unit localisation that I’m also finishing up for ex_cldr_units version 3.0 that will be released in April.

The parser for fluent messages is largely complete, then I need to work on the backend machinery. I’ll push the current work to GitHub soon so anyone interested can comment or contribute.