How to replace accented letters with ASCII letters?

DanielRS · May 7, 2016, 10:48pm

@hubertlepicki String.normalize separates each special character in multiple characters in such a way that their combination represents the original character. Simple example:

iex(11)> "á" |> String.codepoints
["á"]
iex(12)> "á" |> String.normalize(:nfd) |> String.codepoints
["a", "́"]

However, for some reason it doesn’t work when the accentuated character is not the first one in the string:

iex(7)> "aá" |> String.normalize(:nfd) |> String.codepoints
["a", "á"]

@KronicDeth Here’s my output:

 iex(15)> "árboles más grandes" |> String.normalize(:nfd)
"árboles más grandes"
iex(16)> "árboles más grandes" |> String.normalize(:nfd) |> String.replace(~r/[^A-z\s]/u, "")
"arboles ms grandes"
iex(17)> "árboles más grandes" |> String.normalize(:nfd) |> String.replace(~r/[^A-z\s]/u, "") |> String.replace(~r/\s/, "-")
"arboles-ms-grandes"

My machine is running Archlinux, this is the output of running locale in the terminal:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I wonder what the problem could be…