Unidecode implementation for Elixir

cevado · June 11, 2017, 8:36pm

I just published unidecode package to hex:

Unidecode was initially a perl module to transliterate unicode characters to ascii, then it became a widespread library throughout several languages(python, java, ruby, c#, rust). Everyone that already integrated with old systems know that it’s necessary pain.

I published a implementation with a simple approach, that only includes latin code charts. My idea is to gradually add all charts. Right know I’m on documenting/testing step. Basically its a very small application(needs to be added to extra_applications) that expose a simple api, Unidecode.unidecode/1 and Unidecode.decode/1, both do the same thing, receives a string and transliterate the chars that it knows. Why 2 functions that do the same? Perl and Python implementations exposes only unidecode/1 that supposedly is imported everywhere you need it(since rarely you will have a conflit with a function called unidecode). I added decode/1 just in case you dont want to import the function and want to call it refering to the module.

I want to ask that anyone that needs something like that, please take a look at the library, send suggestions on changes to improve it, I published on 0.0.1 and very raw material like that, to see what fits better the community need.

tallakt · June 12, 2017, 1:05pm

Hi. If youre interested you could have a look at codepagex… I wonder it it already does the same thing.

cevado · June 12, 2017, 1:49pm

Hi @tallakt, actually it doesn’t. Convert encodings is a different task from transliteration. Let’s say I have the greek word Λάμ(β)δα, just converting encodings I would loose this data, transliteration looks for a way to represent this data meaningfully. So taking Λάμ(β)δα as an example a transliterated version should be Lam(b)da. In some sort of way it’s kinda a romanization, but avoiding special caracters in the latin script. So the character ç would be just c.