Running the new Elixir formatter

josevalim · October 23, 2017, 11:02pm

The fact we are rewriting this is really bad, haha. Please open up another issue.

Let’s please not make that a generic issue for formatter problems because then it will be impossible to tackle it. We already have a great example, so let’s improve that, we can always open up more issues as necessary.

net · October 23, 2017, 11:03pm

Something that this thread makes me believe—though I’m sure others will likely disagree—is that the formatter is too aggressive. I may prefer a formatter that has a little more respect for the author’s whitespace choices. Consistent code is a worthy goal, but perhaps it would be easier to infer some decisions from the author, rather than attempt to maximize the utility of a single style (or require additional configuration in the case of locals_without_parens).

+10 on trailing commas.

OvermindDL1 · October 23, 2017, 11:05pm

Hear hear. I love most of what the formatter does. Just in a few cases I really really do not want it to touch some things. ^.^;

tmbb · October 23, 2017, 11:06pm

I’m sorry, I don’t understand. What exactly are we/you rewriting?

josevalim · October 23, 2017, 11:07pm

I would store it as a module attribute and then access it in the function body.

You are right. That’s the argument for trailing commas but no parens calls in Elixir make trailing commas a bit inconsistent. For example, you can’t do:

defstruct foo: :bar,
          baz: :bat,

So we felt adding trailing commas could actually cause misunderstandings due to this “inconsistency”.

josevalim · October 23, 2017, 11:08pm

We should keep #! foo like that and not as # !foo!

net · October 23, 2017, 11:08pm

josevalim:

You are right. That’s the argument for trailing commas but no parens calls in Elixir make trailing commas a bit inconsistent. For example, you can’t do:
defstruct foo: :bar,
          baz: :bat,

I do

defstruct [
  foo: bar,
  baz: bat,
]

josevalim · October 23, 2017, 11:18pm

The issue is that it is most likely that you are inconsistent in those choices and do not always follow them in your codebase. We could add options for controlling things like using trailing commas or not, but then people will be bikeshedding on which options to enable. We have three options and they all have cons:

Respecting the user choice - leads to inconsistencies because we are not consistent
Providing configuration - leads to bikeshedding about the code style
None of the above - leads to worse formatting in some cases

One of the things I am really enjoying about the formatter is that I now type code without any care about style and let the formatter do the job. It is absolutely liberating. Respecting the user’s choice means being more careful.

That’s why I want to wait the formatter to really be on the wild before adding more configuration or respecting more the user’s decision.

To me it is definitely a “choose your poison” kind of deal and we have chosen ours. Maybe marking parts of your code the formatter should not touch is a fair compromise but again, it is too early for that.

mischov · October 23, 2017, 11:28pm

For the record:

iex(5)> Code.format_string!("#! foo")  
["# ! foo"]
iex(6)> Code.format_string!("#!foo") 
["# !foo"]

sasajuric · October 24, 2017, 7:47am

This begs the question. Is the goal of the formatter to reach total consistency, or is the goal to improve readability? In my opinion, too much consistency might actually have negative effects on readability, so just striving for consistency might IMO be counterproductive.

I personally confess that I’m sometimes inconsistent, and in many cases this is deliberate. A simple example are tuples. If they are very short, I write them on a single line, otherwise I put each element on a separate line. Same thing goes for maps.

Empty lines are another example. I sometimes use them to break and separate larger subsections in the code. OTOH if the code is very simple, I don’t bother.

That’s why I want to wait the formatter to really be on the wild before adding more configuration or respecting more the user’s decision.

It seems that the initial formatter is going to be quite opinionated, and given what I stated above I’m not enthusiastic about it. Personally, I’d prefer a more relaxed formatter initially.

Another way to put it is that with the formatter we stop paying attention to how layout affects readability, and instead completely delegate this task to the machine. I’m not saying this is bad, but I’m also not sure I like the fact that this is now in complete control of the machine. Personally I’d prefer a more balanced approach, where the machine takes care of mostly mechanical tasks, while people are responsible for case-by-case decisions. Unfortunately, I have no idea what exactly does this means, and whether it’s even possible.

josevalim · October 24, 2017, 11:38am

Those decisions are also taken by the formatter based on the line length. The formatter won’t split a data structure into multiple lines unless the data structure was previously broken into multiple lines (so this is one of the cases where we respect the user’s choice - those few cases are all documented).

We need to remember that the issue is not only consistency with ourselves but also within our teams. This became clear to me as the Elixir team grew because they would ask me why I chose a certain layout and I could not give a reasonable answer. There was no pattern besides “it feels better”.

Empty lines were particularly an issue with Phoenix, where Chris would remove my empty lines and add them elsewhere and I would remove his empty lines and add them elsewhere. We would just do it subconsciously (to be fair, I probably did it more frequently than him ).

It is still worth reminding that you cannot run the formatter blindly on an existing code base. Sometimes you will need to change the code to get a better output. Once you work on those and familiarize yourself with the formatter style, the cases where everyone agrees the formatter code is arguably worse are rare. For those still on the fence, the Elixir codebase is a good place to get started, since it has been fully formatted.

We generally follow the opposite approach when designing code and APIs in Elixir. We always start with the most restricted APIs and behaviours and then add options and flexibility later on. The reason is that it is relatively easy to add something later but it is extremely hard to remove something.

Btw, another idea I had yesterday is to define it as a list of lists. It may actually reflect better the origin of the data since you have a comment there that says “each row defines a dimension”. Then you flatten it and convert it to a tuple.

sasajuric · October 24, 2017, 1:59pm

I wasn’t suggesting supporting options, but rather making the formatter more limited in what it does.

In an extreme example (which I don’t advocate, but mention for illustration purposes), a formatter could only tackle parens in function invocations. That’s clearly not the end of the story, but it’s a simple and limited start, which will mostly preserve user’s choices.

Where I’m really going with this is following. There are still some months before the formatter will become official. Before that happens, if you do receive some conflicting feedback on some features, then instead of forcing one style and gathering feedback, you could consider removing that feature from the formatter and think about whether you can improve it for the next cycle.

The problem with an overly aggressive approach from the first version is that once we all format our code, it’s going to be hard (if not impossible) to undo those changes. So, if you forbid some user choice now, I almost don’t see a point in allowing it later. Therefore, my fear is that whichever user choices are removed in the initial version, they will stay removed until the end of time. For that reason, I propose some degree of caution when deciding on the scope of the formatter. If you’re unsure whether you made a good choice in some place, maybe the feature is not yet ripe to be introduced to the formatter.

josevalim · October 24, 2017, 2:19pm

The options were just an example too.

The issue is that a formatter that does very little would be way less useful and much harder to implement. So I don’t think we could start small, maybe if we chose a completely different approach, which I haven’t see in any of the tools out there.

So respecting the user’s choice takes a lot more work. And it doesn’t change the fact that preserving the user’s choice is a bad idea on the majority of cases, because it is inconsistent and ambiguous.

Take this example:

foo = expr(...)
ba  = expr(...)

Did you mean to align the = sign or was that a mistake? We could analyze a codebase for heuristics but the truth is that if you align the =, you likely do that inconsistently.

That’s true but we will always write new code which can then leverage the new rules. You could argue that this new code will be inconsistent with the previous code and I will agree, because that has been my argument against respecting the user’s decision all along.

OvermindDL1 · October 24, 2017, 3:05pm

True, though that seems to be duplicating the places it is accessed (it is used outside of the module), which although no duplication of information, does mean I have to jump to two places if I’m going to it (click on the function word once, then click on the attribute word), while adding lines of code. Seems like a needless workaround when the code as it is, is perfectly clear and sensible for what it does?

That has always bugged me as well. ^.^;

And as such, I do the exact same thing, every single one of my defstruct’s always use the list format, always (and this is a big reason why I always use lists for keyword lists in functions even if the [/] is optional, it is because it fixes the durn comma issue). ^.^

Actually the consistency is context-dependent. Just because I may like lists to be one-element-per-line 99% of the time, not all contexts will that work well in, like for the multitude of math tables I have sitting here, math is a different context than normal code.

Not so for many of us, like I quite often will select a few lines of code and hit F4 to perform a Natural Sort on those lines, like with deps in mix.exs to just about every map’s key/value’s I write in normal code to others, and the trailing comma vanishing actually breaks the code then.

I’d primarily use it on a lot of the table-like stuff. ^.^

Exactly this, especially with tabular data, information tables, etc…

Very good point. Even though I use the clang-formatter in C++ I still write my code as I expect it to be read purely as well (and thankfully it formats near identically to how I write it, and I turn it off for areas that it breaks).

That is why I like following the rule in Elixir that I did in Erlang. ^.^

If the bodies are short in all functions (especially single-liner’s), 0-lines between heads.
Else 1-line between each head.
Always 2 lines between different heads.
2 lines, then a line of a comment with a header, then a blank line to separate sections (like trivial internal Helper functions at the bottom of a module or genserver public interface vs callbacks, etc…).

Which I’d have to unquote in place (and Macro.escape/1 it as well), which either balloons the function making it more unreadable or I have to move out elsewhere, which removes its definition from it’s public interface, which is always bad for a pure data structure (good with code). In addition this way makes it trivial to copy/paste between my C++/Python/Erlang/Elixir projects as it changes and updates.

Well even with this I like what it mostly does, even the one-element-per-line lists is awesome (just needs trailing comma’s so sorting it does not break as lines are added), it is just context-dependent areas where the format gives information to the reader that the machine may not necessarily care about, like tables or tabular data or so forth.

Thus even if being less ‘aggressive’ when it did finally do it, it would still break in the same way.

I’d think it’s better to do it ‘all’ outright, but there really needs to be a way to control it in context sensitive areas, like instead of turning it off outright it would be awesome to tag a set of lines of code as a different context ‘name’, which a configuration file could have overridden options for so we could do something like disable list one-lining for math tables and disable space collapsing for tabular chunks of data and such, this would both name the area, which is documentation for ‘why’ it is formatted different, as well as control the formatting without disabling it entirely in that area, maybe even turn on more options like natural sorting of lines in a map for example (or a list even though that changes the AST, it is useful for keyword lists). ^.^

sasajuric · October 24, 2017, 3:08pm

Maybe for some cases it doesn’t. An example that comes to mind is vertical spacing (i.e. empty lines). This is a low-hanging fruit which at least to me matters a lot, since I find that vertical spacing can help with the reading experience, and that there’s no one size fits all here (so hence no potential for being consistent).

I don’t buy that, but I feel that the main cause for our disagreement here is that you’re focusing exclusively on consistency, whereas for me layout is all about improving the reading experience, while consistency is merely a tool which sometimes helps, and other times distracts. In other words, total consistency IMO is counter productive.

Did you mean to align the = sign or was that a mistake? We could analyze a codebase for heuristics but the truth is that if you align the =, you likely do that inconsistently.

If the machine can’t answer that question unambiguously, then perhaps it shouldn’t change that spacing in the first place

Just to be frank, I’m historically highly suspicious about formatters, and I also don’t feel that manually laying out code is redundant work, since, again, IMO it’s done in the name of improving the reading experience. That likely puts me in a very small minority, so maybe my opinion here is not so relevant

mischov · October 24, 2017, 3:28pm

I can’t speak for José, but the impression I’ve gotten from his comments and keynote is not that he’s focused exclusively on consistency, but that he’s focused on improving the reading experience through consistency across the entire community.

Code you’ve formatted yourself will almost certainly be more readable for yourself than what the formatter outputs, but if you can become used to a consistent format generated by the formatter and used throughout the community, everybody’s code will be more readable. I think this is a very worthwhile goal, and am willing to put in the effort becoming used to a different style to benefit from it.

That said, I also feel that the formatter is not quite there yet, particularly in areas related to data (which José is working on, at least in the case of tuples), and that there might be a place in the formatter for ignoring a chuck so that a person can put in the work manually formatting when it truly matters for universal readability.

josevalim · October 24, 2017, 3:34pm

It is more work, but definitely not a lot.

We talked about this case above though. What happens when the code is written like this:

def foo(bar) do
  foo(...)


  bar(...)
end

Are you declaring intent or is that mistake? And when it comes to placement of newlines, can you confidently say that:

You consistently apply this rule to your codebase
You can clearly explain your rules to your colleagues
Your colleagues agree with you and also follow your rules

When you say you adding vertical space improves the reading experience, whose reading experience is it? Yours? Or everyone’s? If it is yours and you value your reading experience above everyone’s else (which is totally fine) then the formatter is not for you. But if you value everyone’s reading experience equally then ensuring there is as little friction and guessing as possible when moving between files and moving between codebases is the best reading experience a team can provide.

For the very few cases the formatter absolutely messes it up, we will eventually find a solution.

sasajuric · October 24, 2017, 6:02pm

The main question is can a machine 100% reliably detect that this is a mistake? If it can, then the issue can certainly be resolved by the formatter. If it can’t, then my position is that it shouldn’t take guesses, especially not in the first version.

As I said in my first post here, I’m not consistent, and in many occasions this is deliberate. Consistency is not the goal I’m striving for at all.

My rules are “whatever it takes to make the code easier to read”

We have a fairly lightweight styleguide, although in my personal opinion it’s still too strict. Other than that, the style used should be all about improving the reading experience.

I’ll address both of these points together, as I believe they are stating the same thing.

The readability of the code I write should of course not be assessed by me, but by the reviewers. As a writer, I’m too biased and too familiar with all the intricacies of the code to be able to qualify its readability. The reviewers are the first readers, and are therefore more suitable to judge whether the code narrative can be followed easily.

When it comes to styling, I can personally testify that in my PRs, I get an occasional comment to introduce some blank lines. I don’t recall a single case where someone asked me to remove a newline, but if I were to be overly generous with newlines, I’m pretty sure a reviewer would complain.

As has been argued here, consistency does not imply readability, so I disagree that more consistent code necessarily leads to readability. I do agree that there’s a correlation between the two, but we should never lose sight of the ultimate goal, which to me should be readability.

I also don’t buy the argument that reading different styles is harder. For example, I personally had no problems browsing through Elixir source code, which IIRC had different styles all over the place, nor did I had problems following the styles in other projects.

Moving beyond the code, the same thing applies to classical written form. We all read various books and blogs, which do not follow the same layout conventions, and I don’t hear people complaining about that. Yet when it comes to code, it’s super confusing if clauses are separated by newlines in one project and aren’t in another? It’s not a problem to switch frequently between Elixir/Ruby/JS/… code, but it is supposedly very confusing to switch from the style where tuples are written on a single line to the style where each element is on its own line? I don’t buy that at all.

My impression from this thread (and also the main reason why I jumped into this discussion) is that most of the decisions are described in terms of consistency.

I fear that, once a code is passed through the formatter and committed, that ship will have sailed.

josevalim · October 24, 2017, 6:33pm

It doesn’t take guesses, which is what I have been arguing for. It just ignores your input.

The same way someone asked you to introduce newlines, I have asked someone to remove newlines. I am 100% sure I did because some people like to introduce an empty line after the last function and before the module end:

  end

end

Which is not an issue per se but it is not used anywhere else in my projects. Some other people may have been asked to add new lines in some projects and then remove newlines in another one. And to clarify, newlines are an example, this argument applies to formatting in general.

I would say those are not related at all. I had to present my thesis in a latex format given by the university where I had little flexibility to do changes and I am not using that as an argument for the code formatter. At this point it is obvious that the necessity to standardize depends on the person and on the team.

In any case, going by this line of thought, magazines, newspapers, publishers, etc have a consistent layout that is used throughout the artefacts they produce. They understand the value visual identity and consistency bring to their readers. I don’t expect O’Reilly and PragProg to share their rules, the same way I don’t expect Ruby and Elixir to be consistent between them, but I do enjoy that I know what I am getting when I buy from any of those two.

You are extrapolating arguments that were not said. Nobody said it is super confusing the use of newlines, just that it is inconsistent and highly dependent on personal aesthetics.

You can either:

postpone running the formatter until those issues are fixed
run the formatter and leave a note to eventually fix it
run the formatter and not care about it

We are not forcing you to take any of those decisions. In fact, we have said multiple times you cannot just “run the formatter” and expect it to just work™.

sasajuric · October 24, 2017, 7:18pm

That to me is as good as (or as bad as) taking guesses

I wasn’t clear enough, so I’d like to state that most often the reason for people asking me to add newlines is to improve readability, not for the sake of consistency.

Both O’Reilly and Prag have books on tech written mostly in the same languages (excluding translations). So when I compare Introducing Elixir and Programming Elixir, those are to me the books about the same topic, written in the same language, and therefore don’t feel like Ruby vs Elixir (or any other two programming languages). Despite these books having different layouts (which is understandable from the marketing point of view), I don’t have problems reading them.

Fair enough. But there is an implied notion that inconsistency is per se bad, while consistency is good. You even said yourself that

Can you clarify what you mean by this?

Sadly, I don’t think this is an option if I want to be a part of any team working with Elixir (and I definitely want that). Since the formatter is official, and even the Elixir’s codebase is already formatted with it, it will be hard-to-impossible to argue against using it. Therefore, the formatter, once officially released, pretty much becomes the definite authority.

I don’t have problems with that, in fact, strange as it may seem, I’m mostly looking forward to the formatter. However, I’m asking you to think about how opinionated the initial version should be. Reading some comments here I have to admit that I was somewhat unpleasantly surprised about the scope of the formatter changes in the initial version.