Steamroller: an opinionated Erlang code formatter

lpil · January 4, 2020, 2:05pm

RuboCop isn’t really a formatter, it doesn’t have the ability to format all Ruby code. From making the first Elixir linter and formatters I can say they work quite differently

hauleth · January 4, 2020, 2:17pm

I am aware. My point is to provide diff output to the CLI which would help me a lot in many cases (as I could build rest of the tooling around it). Maybe I will try to write such thing in the future.

starbelly · January 6, 2020, 1:53am

Must not fail to mention this PR that is currently open : https://github.com/erlang/otp/pull/2451

michalmuskala · January 8, 2020, 1:33pm

The rumours are true, I’m working on a formatter for Erlang as well. It is in an alpha state and should be released relatively soon, definitely before CodeBEAM SF.

My general thoughts working on the formatter for a bit would be:

It’s impossible to provide a complete solution based on the regular Erlang parser and AST - macros break it in various ways and it just drops a lot of information needed for the formater. Some examples of this:

[1 | []] and [1] have exactly the same AST
it unconditionally applies “implicit” concatenation of strings loosing original formatting - "\x61" "b" becomes "ab" in the AST without a way to recover the original escape sequence.
in a lot of places atoms get “unquoted” from the {atom, Meta, Name} AST nodes into raw Name, again dropping information about original formatting like escape sequences or quotes.

To me this rules out erl_tidy and in it’s current form rebar3 format.

The original parser is annoying in couple more places, with the biggest one that types and specs have a completely different representation to expressions meaning you basically have to implement 2 formatters, even though the syntax is almost exactly the same.

This leaves basically 2 possiblities - build a formatter based on the raw token stream implementing an ad-hoc loose parser (the approach streamroller took) or tweak the parser until in can handle macros as full fledged nodes and does not drop information (the approach that erlfmt is taking). We chose to go with a fork of the Erlang parser because of two reasons:

it seems to be an easier solution - kind of proving my intuition is looking at steamroller’s code: all of erlfmt, is currently around 1k LoC with steamroller’s parser/formatter being around 2k;
having a parser that can deal with those things will also be useful for other projects we’re working on right now and planning to work on in the future (one of them being a language server).
unfortunately this means we might not be able to properly handle all kinds of macros and weird combinations, fortunately they are relatively rare in real code and we can just reproduce the original source for functions/attributes that use them.

The second difference between steamroller and erlfmt is in the choice of the formatting algorithm. Steamroller is based on the same algorithm as Elixir’s mix format - Christian Lindig’s “Strictly Pretty”. Erlfmt uses a different algebra based on the Jean-Philippe Bernardy’s “A pretty but not greedy printer” paper. In theory this should give us better results with a more expressive algebra with the trade-off of slower execution. Fortunately, with Erlang, we can format each function/attribute completely separately keeping the algebra documents significantly smaller than formatting an entire file in one go, minimising the issue of worse complexity of the algorithm.

Finally there’s a question of what changes and which format the formatter will use. Erlfmt is also based on some choices of Elixir formatter and will be rather opinionated (it has no options at the moment). In general, though, we’ve decided it will only do formatting and no code improvements (so no changing literals or re-parenthesising expressions) - this means probably it will only change whitespace.

dtip · January 9, 2020, 8:55am

Steamroller now has VS Code integration thanks to work by szTheory!

lpil · January 13, 2020, 8:25pm

Thanks for this information, very interesting. I’ve not come across this paper before

dtip · February 12, 2020, 12:21pm

v0.12.0 has just been published.

Now with a default indent of 2 spaces to better match the indent-heavy formatting style (but it’s configurable).

Also makes the output less terrible by fixing improper formatting of negative numbers.

dtip · February 12, 2020, 12:25pm

Great analysis!

we might not be able to properly handle all kinds of macros and weird combinations

Agreed - had to sink some significant time into handling weird macros and other things! Still don’t think we’ve covered them all, but steamroller can handle everything it OTP at least.

we’ve decided it will only do formatting and no code improvements

This seems like perhaps the main difference between erlfmt and steamroller. Steamroller does a little improvement to make parenthesis consistent.

Looking forward to trying out erlfmt once it’s live!

elbrujohalcon · February 25, 2020, 9:39am

I wrote a blog post about this: http://tech.nextroll.com/blog/dev/2020/02/25/erlang-rebar3-format.html

dtip · March 22, 2020, 3:47pm

A special quarantine release for you all: v0.13.2 contains some long-overdue vertical spacing.

Formatted code is no longer massive blocks of text. Multi-line function and case clauses are padding with newlines to break the code apart & make it easier to read.

Enjoy!

dtip · June 17, 2020, 4:32pm

Hot new release: v0.14.0 comes with a massive performance improvement.

Steamroller no longer trundles along squashing code into uniformity. Instead it’s just as devastating but up to 60x faster.

You can also now use Steamroller alongside rebar3-format thanks to work by @elbrujohalcon