Trans - Embedded translations for Elixir

belaustegui · June 11, 2016, 4:37pm

Hi all.
A few days ago I published my first package in Hex.pm. It is called Trans and aims to provide a easy way to leverage database support of JSON datatypes to store translations. Trans is heavily inspired by the incredible gem hstore_translate.

The traditional approach of having adjacent tables for storing the translation information quickly increases the number of JOINs required for retrieving data, especially when a single query contains multiple models. The approach provided by Trans stores translations in a single column of each model, so when a model is retrieved so are it’s translations. Modern RDBMSs provide support for this kind of unschemed data and to use conditions in it.

If you find it interesting, take a more detailed look at:

Trans page on Hex.pm: https://hex.pm/packages/trans
Trans documentation: https://hexdocs.pm/trans/api-reference.html
Trans code: https://github.com/belaustegui/trans

Any suggestions, issues, ideas and contributions are more than welcome.
Love

belaustegui · July 30, 2016, 5:55pm

Trans is now on version 1.0.0 !!!
You can see the release notes on GitHub. The main changes of this version are the improved support for Elixir 1.3.x and the new requirement of Ecto 2.0.

You can update the version of Trans in your project by adding {:trans, "~> 1.0"} to your mix.exs and then running mix hex.update trans.

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love

belaustegui · October 22, 2016, 7:22pm

Hi again! We got a new version of Trans!!

The version 1.0.1 is a minor release that focuses mainly on making Trans more comprehensible by improving the documentation and adding a changelog that conforms to the Keep a Changelog format.

You can see the release notes on GitHub. There are also some nice improvements planned

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love

belaustegui · February 19, 2017, 4:45pm

After a long time we have a new version of Trans.

The main changes in version 1.0.2 are:

Trans now compiles cleanly and is tested on Elixir 1.4.
The dependency earmark has been removed, since it is already required by ex_doc.
A CONTRIBUTING.md file has been added, detailing the contribution guidelines.

You can see the release notes and the planned improvements on GitHub.

The next Trans version will focus on making Ecto an optional dependency that is only required when using the QueryBuilder component.

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love

OvermindDL1 · February 19, 2017, 8:36pm

So this is not so much for translations of the application, but rather for easily allowing users to create their own translated content for a given set of data? Looks useful.

agustif · February 19, 2017, 10:26pm

This is awesome for user-generated sites aspiring to manage multi-lang effortlessly in PostgreSQL with leverage of JSONB types.

Thank you very much for posting about it, I had already in my github stars but forgot about the project now that it might come useful for a side project

belaustegui · February 20, 2017, 5:26pm

Thank you very much for your words @OvermindDL1 and @schp

The mission of Trans is to provide an easy way to retrieve translations from structs or maps, and (optionally) provide an interface for generating Ecto queries by adding conditions on translated fields.

Trans has two main components:

The Translator mission is to retrieve a translation into the desired language, or fall back to the default one if no translation exists. (I also plan to allow more flexibility into the fallback process).
The QueryBuilder mission is to allow creating or modifying queries based on translated values. This component does require Ecto and leverages the power of the JSONB data type of PostgreSQL databases to look into the translations for the queries.

At the moment Trans has a hard dependency on Ecto, but I intend to make this dependency optional in the next version. Then, you will be able to use the Translator component without Ecto in any application.
The QueryBuilder will still require Ecto to work though, but it won’t be even compiled if Ecto does not exist in the application.

Edit: I actually plan to support MySQL also, since newer versions also have a JSON type. But there is an open issue in the Mariaex adapter to add support for this type that must be addressed first. I could look into it myself, but I would need some guidance into where to look first

belaustegui · March 1, 2017, 7:41am

I’ve released a new version of Trans

Trans 1.1.0 makes Ecto an optional dependency.

This update addresses one of the main concerns of trans since its inception: to leverage, but be usable without, a database. The Trans.QueryBuilder component requires Ecto to work, but the Trans.Translator component can be used with any struct or map and does not require a database.

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love

belaustegui · April 11, 2017, 10:11pm

Trans 2.0 is out!

This release of Trans is focused on improving the library interface and making it more safe and usable.
The Trans.QueryBuilder module has been completely rewritten. It now exposes the translated/3 macro that generates an SQL fragment that can be used when building Ecto queries.
The new translated/3 macro is compatible with all the functions and macros in Ecto.Query and Ecto.Query.Api and provides safe checks against translations on non existing or non translatable fields.

Compare how you would create a query with Trans 2.0 and before:

# Now: Trans 2.0
iex> Repo.all(from a in Article,
...> where: ilike(translated(Article, a.body, :es), "%elixir%"))

# Before: Trans 1.0
iex> Article
...> |> Trans.QueryBuilder.with_translation(:es, :title, "%République%", type: :like)
...> |> Repo.all

More detailed release notes can be found at GitHub. I also plan to publish soon an article explaining the changes and the reasoning behind them.

EDIT: the promised article about the changes in Trans 2.0 is now published in Medium

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love

Eiji · April 13, 2017, 1:46pm

@belaustegui: Can you explain why you decided to have all translated columns in big jsonb?
After read your description here I through that you do something like:

defmodule Article do
  # use Ecto.Schema
  use Trans.Schema

  schema "articles" do
    trans_field :body, :string
    trans_field :title, :string
  end
end

so any trans_field is a separate map column.
How about performance when you have big jsonb map (with lots of fields) and lots of records?

Do you validating language codes?

For your #12 and #14 issues: in some cases developer prefer to store locales in database for example table named locales could have fields: id, locale, fallback_locale_id, name and description.

btw. You have still opened 2.0.0 milestone and already released 2.0.0 version

belaustegui · April 13, 2017, 3:00pm

Hi @Eiji , I really appreciate your comment.
I completely forgot about the milestone , it is closed now, thanks!

Your idea of trans_field looks really good indeed. Could you open an issue in the project so we can discuss it further?

I went with this approach for Trans because I wanted to port the hstore_translate gem to Elixir.
I share your concerns with the big jsonb field containing all the translations, in particular when fetching lots of data in queries. I’ve not been able to test Trans in any project with high data volume. I’ve used the hstore_translate gem in a Ruby project and this approach was faster than having the translations in their own separated tables (the globalize approach). I think that this performance gain will still apply in Elixir.

I may perform a test comparing Trans performance versus having the translations separated in different tables. It would be a very interesting comparison

Thank you very much!
Cheers.

Eiji · April 13, 2017, 6:04pm

@belaustegui: I think lots about it today, but still don’t have a one good way. I read some articles about it today and there are lots of pros and cons for all cases (also for not using jsonb). I will think more about it in other time, but I will probably have more propositions for your API.

brightball · April 13, 2017, 7:07pm

I like the approach but it might be worth it to create a trans_LANGUAGE column. Committing to multiple languages on a site isn’t a small task and this would separate each translation out into it’s own column.

You’d get a couple of benefits from that approach.

First, with updates. Variable sized columns that can store blobs have to worry about space reallocation on updates which can stress the database a good bit as it grows, especially with any significant update frequency. By having a column-per-language you’ll end up with multiple smaller fields that update less frequently on their own.

Second, if you know the translation that you want to get back, you can request it in the select rather than the entire JSONB for every language. If there are multiple fields for the variation, you’ll be able to easily query the whole set of translations on that row for that language with one field, rather than having to separate it out of the JSONB. It will also speed up parsing the JSONB by keeping the size consistent.

belaustegui · April 17, 2017, 1:39pm

I’ve published an article explaining the main changes, improvements and future plans for Trans.

If you want to have a taste of the improvements in Trans 2.0 you should read it!

Eiji · April 17, 2017, 2:02pm

@belaustegui: I don’t have so much time, but I simply looked on that article and one point is interesting:

Make translations an embedded schema and create a custom Ecto type.

I was think about it too, especially a configuration that allows to use it like embedded schema or normal association. I think about something like that, because looking for lots of answers, articles and comments I don’t find a clear answer about what is the best way. This is related with data size and what database functions developer would like to use. For simplest scenarios a jsonb is recommend, but it’s not so simply in bigger and/or special projects. I don’t want to copy and paste too much, but I just want to point that it’s not easy to determine best strategy and if someone is going to share a code then I think that it should be as much configurable as possible. What do you think about it?

OvermindDL1 · April 17, 2017, 3:29pm

That looks pretty cool! I look forward!

belaustegui · April 17, 2017, 5:12pm

This is has currently an idea status only, I have to think about it deeper.

On the one side it will require more work to set up translations, since users should create embedded schemas and specify which data they should contain. On the other side it would make translations safer by letting us specify valid fields, changesets, etc.
Since this would be a big change for Trans, it requires more though and analysis. Some other issues should be fixed before addressing this.

I agree with you that it is not easy to determine the best strategy for content translation management. Currently there are different approaches provided by different libraries than trans such as ecto translate or translecto.
I also think that a library should have a single and clear concern. Users can then choose which approach fits best for their needs and pick the right library for it.

belaustegui · April 17, 2017, 5:12pm

Thank you very much !

Eiji · April 17, 2017, 5:50pm

Mostly agree except that situations where 2 libraries could have ~90% of code same. I think in that cases developers prefer configuration over library count and finally if you will implement your plan then it’s easy way to make a configuration for my suggestion, but otherwise not - 2nd library will be better.

belaustegui · July 9, 2017, 9:45am

Hi again! We got a new version of Trans!!

The version 2.0.1 is a minor release which contains the following main changes:

Fixed some issues with documentation examples.
Use Ebert to check code quality.
Relax the dependency restrictions on Poison.

You can see the release notes on GitHub.

As usual, any comments, suggestions, issues or pull-requests are more than welcome!
Love