Steamroller: an opinionated Erlang code formatter

lpil · December 15, 2019, 2:43pm

Wonderful! What a great project

I especially like the decision not to offer any configuration options, to avoid bikeshedding over pointless configuration.

Looking forward to using this in the near future.

rvirding · December 15, 2019, 2:57pm

Yes, sometimes I forget to put a smiley when I should. Has happened before and unfortunately has caused some misunderstanding leading to useless arguments.

Now I am not joking.

elbrujohalcon · December 16, 2019, 12:16pm

True that. You have to re-canonicalize before running git diff

elbrujohalcon · December 16, 2019, 12:48pm

I could be wrong, but I thought the definition of an opinionated formatter is that it offers minimal configuration. Otherwise, what’s the difference between an opinionated formatter and any other formatter?

HAHA! Yeah… rebar3 format is the Groucho Marx of opinionated formatters…

Those are my principles, and if you don’t like them… well, I have others

Jokes aside, you’re right. rebar3 format is not opinionated. It does have a general bias towards a particular formatting (the one you get easily by just running rebar3 format). That’s all.

I can see why your development process is attractive to some. It seems like it carries quite a lot of mental overhead. What happens if you’re helping a colleague to debug and have to read code on their machine? What happens if you’re reviewing a PR on e.g. the GitHub web interface?

Speaking from my experience with the Smalltalk formatter: Everyone knows how to read the canonical format. So, if you want to show code to some colleague and they can’t read it in your style (which is a very strong opinion to hold, but I digress)… you just canonicalize it. Same goes for code on Github, etc… Having your own format is tool that you only use to feel comfortable on your own environment. But I see how this social convention can be… let’s say… abused since nobody will be enforcing it.

The disadvantage of trying to introduce a similar tool to Erlang is that plenty of people have their own ideas about how Erlang should look and those ideas are all fairly well-ingrained. It makes that initial barrier of getting used to a new style a bit more painful. I think it pays off in the long run - both for individuals and teams

This ^ is almost exactly the reason behind our design (together with my very happy Smalltalk background, of course ).
In the end, I don’t think we’ll agree on this but eventually rebar3 steamroll will be compatible with rebar3 format (users will just need to tweak a few options). So, as I said in my previous comment: multiple formatters is better than no formatters.

To go a little bit further (and I’ll eventually copy & paste this in a blog article), here is our line of thought behind rebar3 format, roughly (@juanbono can correct me if I’m wrong):
First of all, we decided that having a formatter is better than having no formatter. Therefore we wanted to create something. The goal was for it to be able to format Erlang code consistently, even if it was consistently ugly.

Just like you (and likely @michalmuskala did), we had to come up with the default formatting style (in your case the only formatting style). Considering that the default for Erlang/OTP’s code is emacs-mode, which is encoded in erl_prettypr, it was clear to us that it had to be the default.

But that format, among other things, mixes tabs and spaces for indentation. That was something that we realized early on that nobody will want. erl_prettypr also puts empty lines between module attributes, for instance. That is something that… some people (particularly all of the Erlang devs within NextRoll) won’t accept. And the list goes on.
We could change the default to fit our own understanding of well-formatted Erlang code, but that’s something that’s far from clearly defined… even among the 10 of us (I’m a fan of comma-first ROK-style, to name a quirk). We wondered: how many people will use our formatter if it only respects our style? Not many, we thought.

So we decided to take the risk of keeping the default format as close to emacs-mode as possible (ugly as it might be) and let devs configure the formatter to make it fit their needs, with the goal of inspiring people to follow the smalltalk dev flow I described earlier.
What’s the risk? Well… having each repo for each maintainer for each company formatting their code in their own style and none of them abiding by the canonical one. Yeah! That might happen, but that’s far better than what we have today where each file in each repo (and some times, if there are multiple contributors, each function) has its own formatting style. Besides, with rebar3 formateven if you find code that you don’t understand or is hard to read/debug/etc. because it’s formatted in not your style, now you can format it to your style, with just a simple command rebar3 as dtip format and… if it was previously formatted by rebar3 format you can apply your changes and easily go back to the format that the repo maintainer prefers.

With a strictly opinionated formatter like steamroller you don’t need to do all those tedious steps and you get the immense benefit of being able to just read every piece of Erlang code out there since it’s already formatted. That’s great! But that requires you to first convince everyone to use steamroller and I bet that would be far harder than convincing them to use rebar3 format and… if you wait long enough, once everyone is using rebar3 format, convincing them to drop the options and just use the default (which will produce the same result as using steamroller) will be (I hope) easier.

dtip · December 16, 2019, 1:43pm

mixes tabs and spaces for indentation

Yeah I’ve noticed some of this kicking around. Plus there are issues where one developer has used a tabwidth of 4, another has used a tadwidth of 8, and another has used spaces! Wonky indentations all over the place

how many people will use our formatter if it only respects our style ?

haha isn’t that the big question! Steamroller already has something like 3 stars on GitHub so you could say it’s going preeeetty well.

having a formatter is better than having no formatter

Too right. I’ll definitely be keeping an eye on rebar3 format. Looking forward to seeing it grow!

ferd · December 16, 2019, 3:13pm

I tend to have my way of formatting code which I can push onto people shamelessly by having written books that use it. But I generally do not care, and rather aim for project consistency.

For me the biggest blocker to adoption of a formatter will be, in particular order of importance:

messing up with git blame's tracking of project history, especoally when it shifts lines around and that even git blame -w -M will struggle with
edge cases ending up making code less readable with no way to have it evade the auto-formatter.
betting on the wrong horse and having to re-change all formatting again and have the discussion over again
the formatter not easily adapting to arbitrary line lengths or being buggy around it. I had this issue with the Elixir formatter when writing Property-Based Testing with Proper, Erlang, and Elixir, because a book has something like 60 columns safe for printing (as opposed to a code’s more often used 76, 80, or 120 cols)
expanding vertically by a lot. This is also a concern when printing books, but not really anywhere else. But if a code sample formatted by hand can be clear within 7-8 lines, but the auto-formatted one expands to 15-16 lines, that might be enough to have to break it out further. The same thing on a manually formatted 15-20 lines sample may now take entire pages and need to be moved to appendixes. So you get to choose between readability (on paper or an e-reader screen) vs. following community standards

Whatever I see gain traction I’m likely do adopt, though at this time I’m holding off to see what others choose and how things develop. I kind of assume whatever I pick may end up may end up hurting alternatives if a project like rebar3 or guidelines at adoptingerlang.org favour one solution over another (we could always show them all).

I’ll probably wait until later in 2020 to make a real attempt on a lot of codebases I have around with a more serious audit, to give time to all the announced projects here to get to some ready state, rather than just picking the runner that started the earliest.

hauleth · December 16, 2019, 7:03pm

I know that this is not general case solution, but there exists git-hyper-blame which tries to solve exactly that problem.

OvermindDL1 · December 16, 2019, 11:19pm

Where’s your built-in gleam formatter?

Ugh, no, Tabs for Indentation, Spaces for Alignment, it’s the only way it should ever ever ever be!

lpil · December 16, 2019, 11:21pm

On the backlog for nearly a year now! Code formatter · Issue #59 · gleam-lang/gleam · GitHub

ferd · December 17, 2019, 5:30pm

Yeah it wouldn’t be useful for me. I’d rather have non-standard formatting with standard tools than need non-standard tools to support my standard formatting. It would also probably ruin all editor or IDE extensions that do blame as part of their display, so that kills more than just diffing.

tristan · December 17, 2019, 5:48pm

I’d like to eventually bundle a formatter provider with rebar3 as rebar3 fmt.

But I only want to do that if the formatter is part of OTP. So it would be great if any improved formatter was submitted as either a new thing along side erl_tidy to OTP or updates to erl_tidy.

I will also be marking my fork of erl_tidy that has a rebar3 plugin, rebar3 fmt as deprecated https://github.com/tsloughter/erl_tidy

dtip · December 29, 2019, 5:06pm

In my opinion the case … of should be on the same line as the LHS of the match

I’m afraid the current formatting here is a deliberate choice. It’s similar to how cases are formatted in Elixir. Imagine something like this:

ReallyLongVariableNameOrSomeOtherLongTerm = case foo() of
    bar -> bar()
    baz -> baz()
end,

The case header is way off to the left of the case clauses. By indenting after the = we guarantee the case statement reads left to right. We could indent the case clauses way off to the right to save a line but then we have floaty code. I don’t think it’s readable or consistent.

If you break one clause into multiple lines, do so for all of them

We plan to solve this kind of readability issue with whitespace padding. Will put a note on here once it’s live.

Also instead of closing one paren per line, I’d really prefer have them closed the “lisp-style”

Yep this is the way it’s typically done in Erlang. I and other developers I’ve spoken to find the one paren per line style easier to grok - you can see matching parens at a glance. There have been plenty of times where I’ve had a heavily-bracketed expression which has been missing a bracket somewhere. It can be frustrating to find if the brackets are all bunched up.

dtip · December 29, 2019, 5:21pm

messing up with git blame 's tracking of project history, especoally when it shifts lines around and that even git blame -w -M will struggle with

This always seems like the main argument given against auto-formatting an existing codebase. Do you know of a proper solution?

As far as I’m aware, it’s a trade off: you can either have consistently formatted code or you can keep all of your git history nicely intact. The sooner you auto-format, the sooner you can start building your git history back up (assuming the project is still in active development).

But for the bigger Erlang open source projects (e.g. rebar3, OTP) the advantage for autoformatting extends way beyond the projects themselves. If we can demonstrate code consistency here then we will encourage code consistency elsewhere. We’ll also show developers curious about Erlang that we have a modern toolchain and a community which cares about modern best practice. It’ll encourage people to try out the BEAM!

edge cases ending up making code less readable with no way to have it evade the auto-formatter

All we can really do here is encourage people to report the problems they have. Please please open an issue on GitHub if you try out Steamroller and it does something rubbish!

ferd · December 29, 2019, 6:38pm

One approach is the one the OTP team has taken with the OTP codebase: you’re allowed to reformat the code you touch, but not to reformat any other code.

The second one: I’m not going to throw away 5, 10, or 15 years of project history for automated formatting purposes. That history is invariably more worthwhile than standard syntax formatting. It would be quite counter-productive to go “well the formatter is nicer, let me throw away most of my ability to search back easily for changesets” when that history can reach for longer than other programming languages have been around. That is, however, very project-specific.

This will be a bit of rain on the parade, but a project like rebar3 or OTP is rarely blocked because of the formatting (of a syntax that arguably a lot of people already dislike to begin with); it is blocked by the problem domain being foreign or unknown to the contributor, along with the project practices being unfamiliar.

Things I don’t recall preventing a merge in rebar3 (though it did happen in rebar 2.x before we took over):

code format

Things I frequently have to remind people of and may sometimes stall PRs or entirely kill features:

testing
corner cases due to specificity of project layouts ("will this work when the app is used as a dep? What about umbrella projects? Can it still work with plugins and mixed languages?)
portability concerns (“solution can’t work on Windows”)
alignment with OTP (“the thing you’re pushing towards isn’t compatible with releases or relups”)
alignment with the project (“adding this feature goes counter to this objective we have or are working towards”)
difficulty of understanding existing architecture
slow response time on our part leading to a closed window of opportunity on a contributor who had limited attention to give to a problem

You could make the argument that we’re losing a lot of contributions in Rebar3 by people who never approach us because the formatting would have been obtuse, but even with well-formatted code, we still have to address all these other concerns.

My experience is that people will happily use the tool, and often report issues. A few will stick around to try a fix once we have it. Sadly, we can only count on both hands the contributors who will stick around and help us debug an issue in depth during a couple of years, and maybe count on one hand those who have had enough of these problems and an interest to contribute to the project for more than one patch.

I would have to say that code formatters are not significant for a project such as Rebar3 because people are not necessarily interested in joining it in the first place, and when they’re interested, they can quickly be surprised by how rapidly the problem space grows complex (hell, I’m still surprised at times and I’ve been around for maybe 5 years).

Considering we have two maintainers with maybe 3-4 recurring active contributors, I’d rather keep the history for the few of us who spelunk the code base (with modules going as far back as 2010 for what we kept from Rebar 2.x) than lose it all for maybe 4-5 commits a year when I have never heard of anyone complaining that formatting held them back.

To me there are far more obvious challenges to tackle, like “understanding how a thing I want turns out to have hairy interactions with 3 other features that we must carefully design around”. While I’m the one doing this, often on behalf of other people in the community, I’ll prefer to keep all the helpful tools I can have.

I will likely use formatters for new or small codebases sometime in 2020, but I am very unlikely to use them on rebar3 outside of evaluating “how much code is this gonna shift around?” I hope my rationale makes sense to you. It’s not that I dislike auto-formatting, it’s that I have much bigger problems to tackle in older projects than how they’re formatted; the reason I like auto-formatters is that they prevent me from having discussions on formatting on new projects and teams; the one thing making me kind of afraid is having to field 4-5 PRs a year of well-intentioned people trying to force-apply auto-formatting to projects where history is too important to use it.

dtip · January 4, 2020, 10:21am

Yep I completely agree! Don’t think I implied projects would be blocked due to formatting.

The point I’m trying to make is that it would be fantastic to be able to tell new developers “Welcome to Erlang, we format our code with X tool, here are all the cool things you do with the BEAM”.

It essentially boils down to creating the right first impression. Any Erlang/Elixir dev is likely to have a dig around in the source code for a long-standing library at some point and if that code has funky formatting it incorrectly gives the impression that the code is poor quality. We want to show readers that Erlang has up-to-date tooling and consistent best practice. That’s how we show that it’s a good choice for modern software projects.

I think the points you’ve made here are all completely valid. Thing is, they’re technical arguments against what is, in my opinion, a solution to a non-technical problem.

Anyway, this is all a bit too blue-sky thinking for now First steps are to try out a formatter on new/small projects!

dtip · January 4, 2020, 10:26am

Making gradual progress with Steamroller itself: as of v0.11.2 it can format the whole of OTP without falling on its face!

The formatting is not yet stable, though, as I’m planning to make a few changes before v1.0.

lpil · January 4, 2020, 10:29am

As a counterpoint the Elixir compiler and standard library were completely reformatted 2 years ago when Elixir 1.6 brought the formatter and I’ve not had any problems searching through or navigating the history before or after that event.

Worst case scenario I may land on the commit that ran the formatter and I have to press the blame button again. I’ve not found it irritating, and it’s a vastly lighter burden and time cost than not having a formatter.

Congratulations! That’s very cool

hauleth · January 4, 2020, 12:14pm

If you have good linter or formatter that outputs where the differences are needed (I am looking at you, mix format) you can use tools like ReviewDog that will fail CI and will output information about failed listing only in lines that have been changed in given PR. I find it really useful and I think that Erlang codebase could use something like that to reduce amount of formatting differences (with addition too .editorconfig file). Especially irritating is mix of tabs and spaces, even in new files like socket.c.

lpil · January 4, 2020, 12:16pm

Formatters don’t really have a concept of lines, the just pretty print an AST. Do you have an example of a formatter that is capable of listing where changes are needed? I’d be interested in seeing how that was implemented.

hauleth · January 4, 2020, 12:18pm

Unfortunately I cannot remember one. However IIRC RuboCop has capability to be both - linter and has support for automatic formatting (in some cases).

I would to that by generating diff of the in-memory formatted file and “original”.