Gettext_sigils - a sigil for using gettext with less boilerplate and better readability

Hi!

I’m happy to share my first Elixir library. But first some …

Background

As an Elixir company from a country with 4 official languages (:switzerland:), trust me when I say: we are using Gettext a lot in our Phoenix projects.

I really love the features of Gettext, but it always bothered me that it adds a lot of noise in the code, especially when using domains/contexts, interpolations or pluralization (besides, I still can’t remember which Gettext macro to use when using domains and/or contexts :sweat_smile:).

That’s why until now, we always added a ~t sigil to all our (phoenix) projects that simply delegates to gettext. We also had an m modifier that is using the current module name (eg. live view/component) as context.

Over the last few days, I extracted this (and more!) as a library called …

gettext_sigils

Github / hex.pm

It provides a new sigil ~t (which felt oddly familiar) for using Gettext translations with less boilerplate and better readability:

# before
gettext("Hello, %{name}", name: user.name)

# after
~t"Hello, #{user.name}"

When using GettextSigils (eg. in your MyAppWeb.html_helpers/0 for Phoenix projects), you can also provide how modifiers are mapped do domains and/or contexts:

# replace this
use Gettext, backend: MyApp.Gettext

# with this
use GettextSigils, 
  backend: MyApp.Gettext
  sigils: [ 
    modifiers: [ 
      m: [context: inspect(__MODULE__)],
      e: [domain: "errors"]
    ]
  ]

# then use it instead of gettext

~t"This is a global message"
~t"This is scoped to the current module/view/component"m
~t"This is a scoped error message"em

If this sounds interesting, there are a few other features, all described in the README.

As this is a very new project, contributions, bug reports and feedback in general are very welcome. I’m currently working on adding pluralization (which is a bit tricky when all you have is a sigil) where I would love some feedback (PR).

Thanks! Danke! Merci! Grazie! Grazia! :heart:

74 Likes

Thanks a bunch for this! I’ve also always been bothered by the verbosity of having gettext() calls all over my .heex templates, but I’ve never done anything about it.

I just migrated one of my projects to using GettextSigils, and one quirky regex replace and some small amount of manual editing later I was all done and it was working like a charm :slight_smile:

<.socialproof name={~t"Papa John"} title={~t"City Planner"}>
# ...is just so much nicer than....
<.socialproof name={gettext("Papa John")} title={gettext("City Planner")}>

Great “quality of life” improvement, so thanks again! :slight_smile:

P.S.
A serendipitous side effect is that my syntax highlighting scheme highlights sigil strings differently than normal binary strings, so it became a lot easier to spot “language strings” vs other strings (like CSS classes etc) in the templates as well.

9 Likes

Love the idea!

12 Likes

Thank you so much for the feedback! :smiley: I’m glad to know I wasn’t the only one bothered by the additional noise.

Here are some ideas I’d like to add until 1.x:

  • Pluralization - As I said, this is bit tricky because I think the only way is using a character or string to split the sigil. The existing PR is using (UTF-8 double vertical bar), but it’s not typeable. I’m thinking of maybe using something like // would be better.
  • Dynamic @doc for sigil_t that includes a list of modifiers (and what they are mapped to) that is displayed by the editor when using the sigil.
  • Include usage_rules for using the library, and a skill to replace fixed strings with ~t in an app (in case gettext was not used from the start).
2 Likes

Nice. I already implemented something like this for internationalizing our project, but plain macros, e.g:

t( "Translate me")
t( "%{count} character(s) remaining", count: @count)
..

What I’ve found extremely useful (and so implemented it) was the ability to define domain at the module level and so that all the t/1, t/2, tn/3, tn/4macros can use it without having to specify it on case by case basis, e.g:

use MyApp.GettextHelpers, domain: "edit_post"

Thanks for the feedback!

This is already possible by specifying the sigils: [domain: "edit_post"] option when using the module.

Using plain macros definitely has some benefits (if the macro takes more the 1 string, eg. pluralization).

1 Like

Lovely. Had a POC that extracted all text from Heex (no sigil needed) but had issues with Gettext, HTML and variables. Maybe you could have a try as you clearly have more experience with Gettext :slight_smile:

Except from splitting sentences on every variabele and Heex-tag, it worked miracles. Every text from the app became a Gettext translatable message :slight_smile:

Help tool to check the tokenizer steps: GitHub - BartOtten/eex_visualizer: Visualizes the compilation steps of (h)EEx · GitHub

1 Like

I just released v0.2.1 of the library with the following features:

igniter install task

installiing the library now automatically replaces use Gettext with use GettextSigils in the project. this allows installing and configuring the library with:

mix igniter.install gettext_sigils

usage rules & skill

I added usage rules to teach LLMs to use ~t instead of fixed strings for newly generated code. also comes with a skill that will:

  • replace fixed strings with ~t in any user-facing part of the application (HEEx templates)

  • suggest using ex_cldr when showing dates, time, numbers, etc.

  • at the end of a task, asks to translate the new Gettext message for all languages used in the project (optional)

  • this works for newly generated code and existing parts of the project!

While this might be a bit controversial (my wife is actually working as a translator :sweat_smile:), this has been a real time-saver and localizing an existing app is now easier than ever! At least it should mark the generated translations as “fuzzy”. See skill source and LLM guide.

6 Likes

This sounds interesting and scary at the same time! :sweat_smile: I assume there are a lot of edge cases like translated DOM attributes, args for nested components, embedded JS?

I love the idea! I run into this problem in every single Phoenix project.

Only the pluralization needs to be solved in a clean way since that is a showstopper.

2 Likes

Thanks for the feedback! Yeah, finding a clean solution for pluralization is a bit difficult (and that’s why I haven’t merged this PR, yet)

For now, I think this is my favorite solution so far:

~t"One post|#{count} posts"N

~t"One post|#{count :: length(@posts)} posts"Nm

It uses

  • a separator that is typable on a keyboard (and rarely used)
  • a reserved (uppercase) modifier N to mark it as pluralization (separator can still be used in regular translations, omitting the separator in pluralizations will raise an error)

But I’m open to other ideas!

I’ve always supported a sigil_M in ex_cldr_messages for the same reason.

Your implementation is much, much better and cleaner.

With today’s introduction of support for Unicode’s Message Format 2 in ex_cldr_messages I can’t wait to make sure both libraries play well together (I don’t think there should be any issues).

4 Likes

Thanks @kip! We are big fans of ex_cldr and we add it to every project. :heart:

That’s why the included translation skill uses (or proposes adding) it when it sees dates, times, numbers, … in the user-facing part of the application.

1 Like

… or maybe use || as separator?

~t"One Post||#{count} posts"N

It would

  • align nicely with the or operator in Elixir
  • even less likely clash with the text to translate

It seems to me that these challenges are more to do with the default message format used by gettext than gettext itself.

Selfishly I’d like to leverage your sigil architecture with Message Format 2 messages supported in ex_cldr_messages.

Therefore I’m hoping there is a way for any syntax design changes you make in the sigil string to be tied only to your own interpolation module?

It feels like pluralization should remain a proper call, because encapsulating all of that in a sigil will likely be less clear than a function call.

The reason I liked the original sigil is because it balanced between conciseness and clarity, but the plural one seems to only be shorter.

3 Likes

Not sure I understand completely because I haven’t used ex_cldr_messages, but I’m definitely interested in making it work with ex_cldr!

Am I correct that you would like to be able to do something like this:

~t"The current time is #{time}!"

And because Gettext is configured to use the Cldr.Gettext.Interpolation (or V2?) backend, it would automatically localize the interpolated time?

If yes, that would be awesome! :smiley:

Currently the sigil string is parsed and represented internally as a list like this:

segments = [
  "The current time is ",
  {:time, time}, # time = AST
  "!"
]

and then transformed into this to be passed to Gettext:

{msgid, bindings} =
  {"The current time is %{time}!", time: time}

For pluralization in the future, the msgid is then optionally split into two parts (probably by ||) and the count extracted from the bindings:

{msgid, msgid_plural, count, bindings}
  = pluralize({msgid, bindings})

For it to work with the ex_cldr_messages backend, it should be transformed into this insted, right?

# V1
{"The current time is {time}", time: time}
# V2 / MF2
{"{{The current time is {$time}!}}", time: time}

If yes, I could make the transformer a behaviour that could be switched out that implements

def transform(segments) 
# -> {msgid, bindings}

It could maybe even detect the Gettext interpolation backend, and choose the right transformer automatically (gettext_sigils could probably include adapters for the default interpolation and ex_cldr)

Would this be enough to support V2/MF2 as well? Because the syntax looks a bit more complicated.

Yes, I agree it’s not very elegant unfortunately. But making it explicit with the N modifier would at least let the users opt-in to use it (or use gettext directly)

1 Like

I’m hoping that the sigil doesn’t do any transformation of the message text - and that it only translates to the correct gettext call with the bindings. The gettext backend configuration includes an :interpolator key so that’s MF2 messages already taken care of.

If you’re planning to adjust the syntax of the messages to have additional capability, then I’m suggesting that would be better done in an interpolation module - not the sigil itself. That way if a user wants to opt in to your enhanced syntax, they configure your interpolator in their gettext backend. Your interpolator can delegate to the standard interpolator for most things except parsing.

This has the added benefit of keeping the sigil code clean, simple and focused (which is what José liked) but still gives you all the flexibility to build any syntax extensions you want.

1 Like

Thanks for the explanation! While I’m not planning to add anything that goes beyond what gettext offers (domain/context, interpolation, pluralization), I have to transform the msgid, because the sigil receives the AST of the Elixir string interpolation, which is then translated to the Gettext (default) interpolation syntax ("%{foo}") and bindings. These are then always passed as is to gettext along with the bindings (which then uses the default interpolator to do the interpolation).

To support MF2, I’d have to generate a string with MF2 interpolation syntax ("{{{$foo}}}" I think?) and pass this to gettext (because that’s what the MF2 interpolator expects, right?), eg:

foo="bar"
~t"{{#{foo}}}" # -> gettext("{{{$foo}}}", foo: foo)

This would probably work for simple MF2 message strings, but get complicated quickly if you use .input, .match etc, no?

I’m sorry if I did not understand what you meant. Could you maybe make an example how the ~t call would look like?