How to create an i18n-able link?

saveman71 · April 4, 2023, 1:47pm

Hello!

I want to generate HTML that would be similar to:

English:

Hello, Please click here to learn more !

French:

Bonjour, Veuillez cliquez ici pour en apprendre plus !

My best attempt:

<%= raw(
  gettext(
    "Hello! Please %{link} to learn more!",
    link: gettext("click here") |> link(to: path) |> safe_to_string()
  )
) %>

But the usage of raw and safe_to_string, the nested gettext really make me wonder if I’m in the right direction. The translation is also unfortunately split into 2, making the translation job harder.

I also thought about embedding the link inside the translation, like below:

<%= raw(
  gettext(
    "Hello! Please <a href=\"%{link}\">click here</a> to learn more!",
    link: path
  )
) %>

First we still need to use raw and now we have HTML in the translations making the whole thing very brittle, and we have to escape the link, we don’t use the native component…

The naive way of doing (below) feels worse because the translation is now split in 3 (think about a paragraph with 2, 3, links), making the translation job even harder.

<%= gettext("Hello! Please") %> <%= link(gettext("click here"), to: path) %> <%= gettext("to learn more!") %>

How do you guys do it? All three options work, but neither feels just right.

saveman71 · April 17, 2023, 8:20am

For posterity, here’s some insight on Stack Overflow html - How do you handle translation of text with markup? - Stack Overflow since this might not be so Elixir specific.

Cochonours · April 17, 2023, 11:04am

I was curious as I haven’t done i18n with gettext in a looong time, but here are some better options IMHO : Gettext html in translation - #3 by danschultzer

  <%= gettext("Already have an account? %{sign_in} to continue", sign_in: safe_to_string(link(gettext("Sign in"), to: Routes.session_path(@conn, :new)))) |> raw() %>

No html in the strings, and all in one place.

saveman71 · April 18, 2023, 7:42am

Thanks for finding that post! I’m a bit ashamed that I wasn’t able to find it myself

I thought of that option, and it works quite well if the source language is English (same language as the interpolation key).

In my case, the source language is French (and I am not in a position to change that). So the solution becomes:

<%= gettext("Vous avez déjà un compte ? %{sign_in} pour continuer", sign_in: safe_to_string(link(gettext("Connectez-vous"), to: Routes.session_path(@conn, :new)))) |> raw() %>

It’s less ideal, but works. I’m still not 100% convinced because the two translation strings are not so connected (hard for the translator to make the connection between the two, in the middle of 100s of other strings).

Maybe we can make an exception and use French only for these keys?

<%= gettext("Vous avez déjà un compte ? %{connectez_vous} pour continuer", connectez_vous: safe_to_string(link(gettext("Connectez-vous"), to: Routes.session_path(@conn, :new)))) |> raw() %>

So given all these compromises, I also consider the following solution:

<%= gettext("Vous avez déjà un compte ? %{a_start}Connectez-vous%{a_end} pour continuer", a_start: "<a href='/login'>", a_end: "</a>") |> raw() %>

Unfortunately:

it’s very brittle
we can’t use link/2 anymore
if we have multiple links, it starts to be even more brittle (a1_start, a2_start, etc. no good solution)

Compromises

LostKobrakai · April 18, 2023, 7:54am

That’s exactly the issue here. Translation tools aren’t well suited for allowing the translation of segmented text without adding the details about the segmentation into the translatable text. This becomes even more tricky if the translation might reorder links (when multiple ones) in which case you actually require something in the translated text to map to the correct link.

Doing that with less compromises would likely require a new dataformat for doing translations with, which explicitly supports this usecase.

benwilson512 · April 18, 2023, 8:46am

There are a lot of good answers here, so I’ll just throw my $0.02 in and note that perhaps a different design would make this easier? What you have is basically a “call to action”, and calls to action work better when they are pulled out from the actual text and are visually represented in a more emphatic way like a button. This happens to also make the translation easier since you aren’t doing an inline link.

saveman71 · April 18, 2023, 9:04am

TBH I crafted these examples because they were easy to understand, it’s a good point though, when possible taking the link out is good practice!

But for example, there are two good examples on this page only:

(and that last one can have multiple links too!)

odd · April 18, 2023, 9:11am

I found this article very useful when dealing with HTML in translations
https://angelika.me/2021/11/23/7-gettext-lessons-after-2-years/

kip · April 18, 2023, 9:45am

@saveman71, As is often the case Unicode has a slightly different take that is, I think more suited to complete language expressions. Its the Unicode Message Format and I have an implementation of it in ex_cldr_messages.

After the great work done by @maennchen, the latest versions of gettext fully supports merging those messages into a .po file just like any other message type (little know fact that .po files are independent of the message type, its just that gettext messages predominate).

Its possible this a better fit for your needs and if so, let me know and I’m happy to help.

LostKobrakai · April 18, 2023, 9:53am

I recently looked into ICU messages for exactly the reasons discussed and it also doesn’t really have an answer for it, even though there’s a lot of powerful stuff in them. It’s still build to handle a string of text – even though with a lot more gramatical/language related options – but it’s not really better in handling formatted text or text interspersed with other types of markup.

kip · April 18, 2023, 9:54am

I smell an opportunity. Very open to thoughts on what an API might look like to make this more ergonomic from a developer point of view.

LostKobrakai · April 18, 2023, 10:07am

I think the concept missing is some form of tags, which at least allow for unique identification (e.g. link a vs link b). HTML on the message could be that, but usually that’s to low level for what you want translators to deal with – at least without also postprocessing the translated message. HTML also often includes things, which are supposed to be dynamic and are therefore not great for being put on a message string.

From a library standpoint it would be great if the returned value would support more than a translated string, but also a list or map of sorts, so translations can be interspersed with markup. E.g. with heex you want as much markup to be statically known as possible, so diffs can be optimized, while only the pieces of text in between should be dynamic.

I also think there’s might be opportunities to integrate gettext on an even lower level with the heex engine.

kip · April 18, 2023, 10:16am

Could you given an example of what that might look like from a developer point of view?

Gettext messages are quite simple so I’m not sure it’s easy to get more low level.

Unicode message format, which has its own flaws being addressed by the MFWG of Unicode and which has nearly completed a new standard, is at least a formal grammar so there is room to work at the AST level if that helps this kind of situation.

LostKobrakai · April 18, 2023, 10:35am

Let me try that with some xml like format.

"There are <link_a>%{n} new</link_a> topics remaining."

At best the API would return something like ["There are ", {:link_a, [], "12 new"}, " topics remaining"], so I as a developer can then iterate the pieces like e.g.:

Enum.map_join(translation, "", fn 
  {:link_a, _, inner} -> link(inner, href: "…")
  text -> text
)

Or there could be some integration with heex, which is able to build up heex at compile time, which properly marks only dynamic pieces as dynamic, but any html, which might be static between all translations – in this example the link markup – is correctly detected as static for the template.

Essentially this is building an intermediate markup language to mediate between the developer concerns and translator concerns.

As a developer I care for:

Finely split content requiring translation from content managed by the code.
- things like urls or classes don’t belong in translated content
Being able to incorporate translated content back into the places where they’re needed.
APIs to go beyond string concatination. HTML in dynamic strings means something like heex will miss it at compile time.

As a translator I don’t want to deal with implementation details. I don’t care where a link goes, I just need to make sure the correct text will be linked. I also want some help in making sure my translations don’t violate the tags used when parsing, like </link_a>%{n} new<link_a> would be detected as incorrect. I also think any markup language used for something like that should be as simple as it can be.

Cochonours · April 18, 2023, 11:50am

I’d go with the %{connectez_vous} too. As long as that key is available, the translators will be able to put it all together.

The second option looks ugly but only because of the gettext syntax, so it might not be that bad. I do use something like this with react-i18next, but as the Trans component has an idea about xml it looks better:

    <Trans i18nKey="aze:agree_terms"
      components={{
        a: <Button theme={xxx} link onClick={yyy} />
      }}
    />

"agree_terms": "I agree to <a>Terms & Conditions</a>",

Note: I called it “a” because it’s a link but it can be anything, and the lib will match between that “a” component and “<a>”,“</a>”,“<a/>” tags in the string.

saveman71 · November 21, 2023, 10:01am

Conclusion of our implementation choice:

We’ll go with the “simple” approach that describes open/close tags in the translation, an implementation is here: A custom `gettext_with_link` macro for easily putting inline links into gettext strings · GitHub (scroll down for our version, all credits go to the original author).

For now, we’ll sanitize the result at runtime but ideally we’d like to sanitize our PO files at compile time, I’ve opened an issue in that sense on the gettext repo: Allow to transform messages at compile time · Issue #380 · elixir-gettext/gettext · GitHub