Convert 'charlists' into ~c"charlists"

Hi everyone,

We are considering deprecating 'charlists' in Elixir in favor of ~c"charlist". In many languages, 'foobar' is equivalent to "foobar", that’s not the case in Elixir and we believe it leads to confusion. If we were to deprecate this functionality, users would get a warning/error upfront about the usage of single-quotes.

The first step in this journey is to change the formatter to rewrite 'charlist' into ~c"charlist". This will basically make the migration painless for most developers.

With this in mind, @sabiwara has submitted an awesome PR that implements this change in the Elixir codebase. Perhaps, more importantly, he has also simulated PRs to important projects to access the impact of this change. Here they are:

As you can see, the changes are minimal to most projects, including postgrex and mint which integrate with OTP libraries. So from a compatibility point of view, this change is relatively straight-forward.

However, should we go ahead with this change? There would be two reasons for such:

  1. If single quotes raised/warned in the future, would that lead to better learning and user experience?

  2. If [97, 98, 99] prints as ~c"abc" in IEx instead of 'abc', is that more, less, or equally confusing? (let’s assume for now that we will either print one or the other, in order to avoid side-tracking the discussion)

Thoughts are welcome!

50 Likes

A thousand times yes, also for noobs trying to speak to erlang libs.

8 Likes

Think this would be a great change for newcomers to the language.
Even though it’s a matter of read the docs it is not consistent with other languages like you said, there is a reason for that but at the same time expected behaviour is good. :slight_smile:

Thumbs up from me.

2 Likes

Amazing! :heart:

Definitely!

Personally I would like to vote for ~c(abc) instead. This would make many beginners think that ~c is “some kind of function”, so significant % of those should take a look at Kernel documentation at start before asking on forum/slack etc. sigil_c documentation would be then a “core/starting point of understanding the charlists”.

I believe we need to enhance said sigil_c documentation a lot. Even if it would not describes charlists in this @doc it still should point to a separate guide describing “what and why” is it.

7 Likes

:+1:

can we then do

>>> 'someone once said: "Hello, World!" - and now we are stuck with it.'
'someone once said: "Hello, World!" - and now we are stuck with it.'

?

3 Likes

I can count on one hand the number of times when I wanted a list of printable ints to be shown as a charlist in the ~7 years I’ve been doing Elixir. Also in that time I’ve answered why it does that countless times to new people and in not one of those cases were they interacting with Erlang and needed it to be printed as text. I think the time has come to legitimately question whether this should still be the default given how easy it is to opt in to the behavior.

Love the change.

19 Likes
  1. I think it is equally confusing. We’ll simply achieve changing of wording a bit in topics from newcomers, so instead of

why [97, 98, 99] prints ‘abc’?

we’ll have

why [97, 98, 99] prints ~c"abc"?

In order to avoid confusion at all, the list of integers should be printed as list of integers. In general, I don’t care. I don’t interact with charlists often.

5 Likes

I think we should start with IEx.configure(inspect: [charlists: :as_lists]) being the default (as @fuelen seems to be suggesting as well).

I vote for the sigil syntax as well – ~c"abc" and ~c(abc) should be identical, right?

I believe people carry implicit assumptions with them from other tech and that has been visible by the frankly infinite number of times people have come to the forum to ask about why does [97, 98, 99] show 'abc' in iex. So we should piggy-back on those assumptions and have the sigil syntax moving forward.

I love the idea.

3 Likes
  1. The 'foobar' != "foobar" bit me quite a bit when I was learning Elixir. Being explicit about it will be a nice change for newcomers to the language.

  2. Having [97, 98, 99] output to terminal also confused the hell out of me many many times. Seeing ~c"abc" in the terminal will be so much better for me and instantly tell me what I’m seeing.

I see it as a win/win for newcomers and veterans alike. Fantastic change @josevalim

4 Likes

If I’m not mistaken you could also do ~c'abc' for any (possibly imaginary) person that insists on a head nod to the past. But I think the question is a presentational one and ~c(abc) flags stronger in a beginner’s mind that “I might not have done something wrong”, though I do think the best solution is to just do charlists as the number lists. Also very much for prohibiting the use of single quotes (except in sigils) as an error moving forward, though a transitional deprecation warning is probably a good idea

1 Like

It’s a good thing for newcomers who are using charlists by mistake when they don’t want to.

If you aren’t new and want to use charlists it seems slightly more annoying but not the biggest deal.

1 Like

I love that proposal, I’ve raised this as one of my points here and I think it will solve the problem of two ways of expressing strings.

I think it will, but there are a few other things we need to keep in mind.

  1. I think that compiler shouldn’t raise more than one warning for dependencies
  2. We need to do something about string() type (maybe warn too?)
  3. We need to create a crawler which will translate old elixir codebases to use sigils instead of charlists

I think that printing ~c"" version is the best decision, because it won’t break any software which was used to grep logs or errors, but it’ll give an idea that '' is deprecated

1 Like

It seems very weird to me to optimise any aspect of a language for the first hour or day of learning. I’m as poor a programmer as exists anywhere (possibly the only long-term unemployed developer in the contemporary world!), am not that distant from the time when I first came across Elixir (ie. I remember what starting with it was like), and can say with 100% certainty that there’s no way the '/" distinction could have caused more than a 1 minute confusion.

3 Likes

I’m indifferent, despite having to explain charlists to newcomers several times.

I saw some proposal somewhere (can’t remember) of deprecating/removing charlists from Erlang/Beam entirely… I understand why that’s not feasible, but… one can dream… :innocent:

Well … in that way we could remove all syntax sugar in Elixir. :smiley:

Example question:

Why {:ok, number} becomes map [ok: number]?

defmodule Example do
  def sample(number), do: {:ok, number + number}
end

iex> for number <- 1..10, do: > Example.sample(number)
[ok: 2, ok: 4, ok: 6, ok: 8, ok: 10, ok: 12, ok: 14, ok: 16, ok: 18, ok: 20]

and someone may say we should print every keyword like a normal list [{:key, value}, …] as printed result may be confusing for newbies.

The key point here is to keep as much as we can and try to not confuse. There would be always some difference with other languages and there would always be some features newbies does not know about.

One time someone tries to force me to write code without pipes as new developers does not know how pipe works. Instead he used a nested, nested, nested … case/cond/if and code he describes as “easy to read” had something like 13 indention levels in one line … :smiling_imp:

2 Likes

I’d expect this to be a useful change. Raising the signal of differenciating binaries from charlists should be a net positive. Though I’m not sure about the use of double quotes as the sigil delimiters. They might be more confusing to people unaware of charlists, but I’d probably prefer them as someone aware of them.

I don’t think we can get rid of the need of being aware of charlists and what they are/how they work. So the best we can do is make people aware of them earlier, rather than later, at dev/built time, rather than in production. This change would support that.

I’d argue both are equally confusing. But I don’t think they’re equal in helping people to resolve that confusion.

To someone unaware of charlists and the fact that single quote are not interchangable with double quotes in elixir 'abc' doesn’t hold any signal of “this is something you don’t know about yet”. At best they wonder why there’s single quotes suddenly, but from my experience that doesn’t help much without knowing the significance of that. ~c"abc" on the other hand does hold more signal, even if the person might still wonder why there’s a sigil in front of the “string”. It hopefully leads them to learning about sigils (if unknown before) and sigil_c docs will lead them to learn about charlists.

Having provided explanations to many on slack as well as having tried to improve the documentation of charlists a few years ago I still feel like the most difficult part of explaining charlists is the fact that formatting a charlists as text is only a matter of representation of a list of integers, but it doesn’t mean it’s not a list of integers anymore. Using sigils hopefully makes this less mind bending, given other sigils are also used to represent a certain value via a different representation of it. ~D[2022-08-10] is a textual representation of a date struct, ~r/\d*/ defines a regex struct via its text format, …. At best people might have been in contact with sigils before, but given how many code challenges/katas/… accidentally produce charlists this is certainly nothing to rely on.

7 Likes

formatting a charlists as text is only a matter of representation…

I agree and was coming to make the same point. In my experience helping newcomers, the confusion stems from people assuming that 'abc' is a string - because it would be in other languages and it’s a reasonable assumption.

If it becomes ~c"abc" it’s now clear that we are not talking about a string which should help remove the confusion and prompt them to look up the ~c sigil.

There is definite value in being able to see both representations (the numbers in the list OR the characters they refer to) but usually when we write charlists we think in letters, ie we write 'abc' and not [97, 98, 99].

So I think [97, 98, 99] should be printed as [97, 98, 99], but if you write the sigil it’s printed as ~c"abc".

Big plus one on the proposal and +1 for printing as ~c"abc" from me.

3 Likes

Great idea. Not only this gives more signal about what is happening, but also it simplifies the language by reusing an existing thing (sigil) in place of a separate one (single quote) in an intuitive way.

I remember we have already been into this, at least partially and reverted the behaviour…

IIRC the situation we have been in was roughly the following:

  1. Inspection used sigil_c instead of single-quotes
  2. Usage of singlequotes was at least soft deprecated

Though both of this has been reverted within the same minor release via a patch release or in the next minor. (I do not remember exactly)

I also do not remember the exact reasoning behind reverting the behaviour.

What has changed since then?

PS: I am happy about this change, and use sigil_c in my own code base since when the mentioned deprecation was active!

1 Like

Nitpick: the printed form should be ~C"chars" since ~c"foo #{bar}" will try to interpolate bar


Having answered a fair number of these, I don’t think anything short of a one-time message from IEx, “Hey, I just printed that thing that looked like a charlist as a charlist. Read this doc if that wasn’t what you wanted” would solve the problem. ~C is a tiny bit more google-able but not by much, and “I entered ~C(hi there) but IEx thinks it’s ~C"hi there" halp” posts could appear.

4 Likes