Codepoints vs Grapheme

Programming Elixir has a nice section with examples from page 123 under the Double-Quoted Strings Are Binaries section:

graphemes(str)
Returns the graphemes in the string. This is different from the codepoints function, which lists combining characters separately. The following example uses a combining diaeresis along with the letter “e” to represent “ë”. (It might not display properly on your ereader.)

iex> String.codepoints "noe\u0308l"
["n", "o", "e", "¨", "l"]
iex> String.graphemes "noe\u0308l"
["n", "o", "ë", "l"]

The printed version of this book actually confused me about this distinction, because it is printed incorrectly! It shows the following, which does not match your IEx output:

iex> String.graphemes “noe\u0308l”
[“n”, “o”, “e¨”, “l”]

So in general, if you want to get each printed character of a string as a list, use String.graphemes/1

6 Likes