Why is the output of single-quoted strings sometimes numbers, sometimes just the string?

Can someone explain this behavior?

iex> 'hełło'
[104, 101, 322, 322, 111]
iex> 'hello'
'hello'

Why isn’t there a consistent output? If single-quoted strings are character lists, why are the code points only revealed when the string contains multi-byte characters? How can I see the internal integer codepoints of any single-quoted string?

Strings/charlists are confusing enough without the obfuscation. Could someone shed light on this?

1 Like

Agree that it’s confusing, but what’s the inconsistency?

If all integers in the list are ASCII characters, iex displays it as a string as a convenience. Quickest, but hacky, way to see integers is append a 0, 'hello' <> <<0>>

There’s also the :as_lists option to IO.inspect https://hexdocs.pm/elixir/Inspect.Opts.html#summary

I would expect the following:

iex> 'hello'
[104, 101, 108, 108, 111]

It’s inconsistent when entering one value reveals the integer code points, but entering another value does not.

The <<0>> null-byte trick is good, but I think you have to adapt its syntax when working with character lists:

iex> 'hello' ++ [0]
[104, 101, 108, 108, 111, 0]

The charlists: :as_lists is the most helpful option here:

iex> IO.inspect('hello', charlists: :as_lists)
[104, 101, 108, 108, 111]

It’s the counterpart to the binaries: :as_binaries when working with binaries:

iex(10)> IO.inspect("hello", binaries: :as_binaries)
<<104, 101, 108, 108, 111>>
"hello"
2 Likes

Here are instructions to always display charlists as lists in IEx: IEX - char printing - odd behavior

In particular, charlists will be printed back by default in single quotes if they contain only printable ASCII characters

https://hexdocs.pm/elixir/List.html#module-charlists

2 Likes

You’re right, I got that wrong, confusing binary syntax with charlist. LOL that’s the whole subject of the thread to begin with.

The reason for sometimes printing lists of integers as ' ' strings is that this is how strings are represented in Erlang, as lists of unicode codepoints. And the reason for this is that there is no string datatype in the BEAM, no character datatype either for that matter. So we fake strings.

In Erlang with lists of integers, which is a “classic” way of doing it in functional languages, and in Elixir with binaries containing UTF-8 encode chararcters.

iex(8)> "abc"         
"abc"
iex(9)> "abc" <> <<0>>
<<97, 98, 99, 0>>
iex(10)> "a™b"
"a™b"
iex(11)> "a™b" <> <<0>>
<<97, 226, 132, 162, 98, 0>>

using the append a 0 trick to show the internals.

It’s a hard life. :wink:

1 Like