Gettext msgid's. Keys or Strings?

smeevil · April 8, 2017, 6:22pm

Hi all,

I’m heavily conflicted on gettext and how to best use the msgsid.
The official way of using gettext is by using the msgid as the text as needs to be displayed.

For example “Hi, %name}, good to see you again!”.

As far as I understand, the big reason for this is that when you change the text, you will automatically invalidate all the derived translations and thus it’s easy to pinpoint which ones need updating as well as the code becomes clear and you directly see what kind of text / purpose a gettext call has.

Now on the flip side you could also use msgid as keys.

For example “login-welcome-message-%{name}”

This way you say that its scoped for a login , the purpose is a welcome message and the keys that are available are %{name}. This makes it a bit harder to parse mentally but now the origin language can change without invalidating other translations. Which of course is both good and bad.

Coming to mind might be conversion optimisations where a “product-view-add-line-item-button-text” might change from “Buy” to “Add to cart” because you have seen some suggestions people prefer this. Though in the already translated languages it still makes sense to use translated version of “Buy”. So In this case your key is more relaying intent and purpose then the actual text used.

Regarding translating from the origin language to others, this would not really make a difference. When using the raw PO files you will have a bit of a hard time because the msgid are keys, and you will need to open the origin language one to see the original translation. Though most (online) translation tools use the origin language PO to extract the key and the base translation, so when translating in an other language, you will see the key, and the original translation, making it quite easy.

I am wondering if anyone could provide some input on their experiences with using either form and what the pro’s and con’s are based on the form they prefer.

Thanks for your time,
Gerard

thomasbrus · April 19, 2017, 11:27pm

I tried the key-based approach for a while because I felt like it was more… correct, but I ran into the following downsides:

In the template you don’t see the actual text anymore (you’re already keeping a mental model in your head of markup => design, so abstract keys don’t help there)
Extra choices to make regarding naming convention, and having to maintain it / keep it consistent
Duplication, for example: login-page-password-field-label & registration-page-password-field-label may have the same translation (“password”). (So… then you create a single translation, password-field-label, but then you deviate from the naming convention)

Hope that’s helpful:)

brightball · April 19, 2017, 11:56pm

In addition to that, there are usually scripts available that can extract all of your gettext snippets from the code base to create the translation files for you.

When you or somebody else sits down to write the snippets in another language, having the key already be the reference language is really helpful.

Here’s a short post from about a decade ago about it dealing with the concept in CakePHP.

http://www.brightball.com/articles/string-localization-with-dynamic-content-in-cakephp

whatyouhide · April 21, 2017, 8:21am

This is what Gettext for Elixir does as well with mix gettext.extract.

outlog · April 21, 2017, 8:39am

It really depends and you should test your workflow e2e before finally deciding…

I recommend using normal text as the key, that makes the developer happy, and the devs don’t have to go through the entire translation workflow every time they add a string of text.

gettext has a context metadata thing make sure you use that - and come up with some kind of naming convention for different things - that way you will have metadata for the string, and you will avoid key clashes - eg “OK” would clash - but “OK” context: “user create form missing data modal” would not clash with other "OK"s.
context also gives erhmm context to the translator and an ability to find/locate where the string is used.

check that your gettext implementation supports context.

then figure out your workflow, I’ve used poeditor.com in the past (not super happy about it, but gets the job done), you can usually integrate with your gitflow - so you gettext extract and gitpush that file, the new strings are on poeditor immediately, and then you can export through git from poeditor as well… and finally you might need a gettext compile step depending on your gettext.

gflohr · August 29, 2017, 12:43pm

Keys have one single advantage: Somebody can correct the messages in the base language (because there is no base language). With gettext you either have to change the source code or you have to define your base language as “broken English” and provide a translation catalog for proper English.

Otherwise I know only disadvantages of keys. Apart from the ones mentioned above, these are:

The translation files tend to contain stale messages.
There are no standard tools that keep the translation files in sync with the sources and with each other.
Very hard to produce gramatically correct messages for plural forms (google for “ngettext”).
Without an extra translation cache, translations soon become unmaintainable or very expensive. Gettext has a translation cache built-in.
Translation errors are hard to detect in key-value files. Compare that to PO, where you see the original next to the translation.
Gettext allows you to easily mix a multitude of programming languages.
Gettext is a well-established standard with a mature and powerful toolchain and an ever-growing user base. Open-source projects are almost exclusively localized with GNU gettext.