Possible bug with String and Turkish characters

sztosz · June 19, 2017, 11:19am

In Turkish alphabet you have I ı İ i those letters

But Elixir incorrectly converts upcase İ to downcase i̇, with double dots

iex(1)> x = "İ"
"İ"
iex(2)> String.codepoints(x)
["İ"]
iex(3)> String.graphemes(x)
["İ"]
iex(4)> x = String.downcase(x)
"i̇"
iex(5)> String.codepoints(x)
["i", "̇"]
iex(6)> String.graphemes(x)
["i̇"]

Am I missing something? Or ist it a bug?

Beware though those letters look different in chrome on macOS Sierra in Chrome(shows properly those double dots), and different in Windows 10 in Chrome (does not show those double dots)

josevalim · June 19, 2017, 1:30pm

To quote the String module:

In general, the functions in this module rely on the Unicode Standard, but do not contain any of the locale specific behaviour.

Since this behaviour is locale specific (i.e. it behaves differently depending on the locale), it is not implemented by Elixir’s String. What constitutes locale specific behaviour is documented by the Unicode Standard.