Sourceror - Utilities to work with Elixir source code

Now that the next Elixir version will add Code.quoted_to_algebra/2 and Code.string_to_quoted_with_comments/2, we’re able to take some source code, parse it, change it and turn it back to formatted text. There are a couple gotchas if you change the ast in certain ways: since quoted_to_algebra requires the ast and comments to be given as separate arguments, we need to reconcile the line numbers of ast nodes and comments if we want the comments to be correctly placed.

So I wrote Sourceror, an (experimental) library that provides utilities to perform manipulations of the source code. I’m still working on more docs and examples(and tests), but this is an example of a function that expands multi alias syntax(ie: Foo.{Bar, Baz}) into their own lines:

Or a function to add a dependency to mix.exs ala npm install:

Since the new functions are only available in Elixir master, Sourceror depends on Elixir 1.13.0-dev and can only be installed via git dependency.

25 Likes

Random, pretty particular question. I was curious about writing something that could alphabetize my Mix dependencies. Would this be the right tool to build that with?

3 Likes

Yes! What you have to be mindful of with this kind of manipulations in contrast with macros is that you need to consider how line numbers move around. You need to both reorder the dependencies, and correct the line numbers, otherwise comments may be misplaced. The reason is that Code.quoted_to_algebra requires the ast and comments as separate arguments and mixes them by their line numbers.

One way to achieve what you want is by doing this:

"""
defp deps do
  [
    {:a, "~> 1.0"},
    {:z, "~> 1.0"},
    {:g, "~> 1.0"},
    # Comment for r
    {:r, "~> 1.0"},
    {:y, "~> 1.0"},
    # Comment for :u
    {:u, "~> 1.0"},
    {:e, "~> 1.0"},
    {:s, "~> 1.0"},
    {:v, "~> 1.0"},
    {:c, "~> 1.0"},
    {:b, "~> 1.0"},
  ]
end
"""
|> Sourceror.parse_string()
|> Sourceror.postwalk(fn
  {:defp, meta, [{:deps, _, _} = fun, body]}, state ->
    [{{_, _, [:do]}, block_ast}] = body
    {:__block__, block_meta, [deps]} = block_ast

    lines = Enum.map(deps, fn {:__block__, meta, _} -> meta[:line] end)

    deps =
      Enum.sort_by(deps, fn {:__block__, _, [{{_, _, [name]}, _}]} ->
        Atom.to_string(name)
      end)

    deps =
      Enum.zip([lines, deps])
      |> Enum.map(fn {old_line, dep} ->
        {_, tuple_meta, [{left, right}]} = dep
        line_correction = old_line - tuple_meta[:line]

        tuple_meta = Sourceror.correct_lines(tuple_meta, line_correction)
        left = Macro.update_meta(left, &Sourceror.correct_lines(&1, line_correction))
        right = Macro.update_meta(right, &Sourceror.correct_lines(&1, line_correction))

        {:__block__, tuple_meta, [{left, right}]}
      end)

    quoted = {:defp, meta, [fun, [do: {:__block__, block_meta, [deps]}]]}
    state = Map.update!(state, :line_correction, & &1)
    {quoted, state}

  quoted, state ->
    {quoted, state}
end)
|> Sourceror.to_string()
|> IO.puts()

# =>
defp deps do
  [
    {:a, "~> 1.0"},
    {:b, "~> 1.0"},
    {:c, "~> 1.0"},
    {:e, "~> 1.0"},
    {:g, "~> 1.0"},
    # Comment for r
    {:r, "~> 1.0"},
    {:s, "~> 1.0"},
    # Comment for :u
    {:u, "~> 1.0"},
    {:v, "~> 1.0"},
    {:y, "~> 1.0"},
    {:z, "~> 1.0"}
  ]
end

The other thing to note is that this is not your regular AST, Sourceror uses the literal_encoder: &{:ok, {:__block__, &2, [&1]}} option for Code.string_to_quoted_with_comments/2 under the hood, so you need to expect literals to be wrapped in blocks, so for example {:a, "~> 1.0"} will become:

{:__block__, [line: 1], [
  {{:__block__, [line: 1], [:a]},
   {:__block__, [line: 1, delimiter: "\""], ["~> 1.0"]}}
]}

This is explained a bit in the Formatting considerations section of the new functions :slight_smile:

Of course I will try to expand Sourceror as we find more complex use cases that could be simplified :slight_smile:

3 Likes

@doorgan this is exciting work - thank you!

Can you talk a little bit about who is the target audience, and your vision for possible use-cases?

Could this be used by something like elixir-ls to implement refactoring operations? (eg rename variable, rename function, extract function, inline function, rename module)

What types of contributions and testing would be most useful to you?

1 Like

The target audience is primarily tool authors, like elixir-ls or credo.

Yes, those are the kind of use cases I had in mind :slight_smile: The Sourceror.to_string/2 function has an option to set the indentation level of the resulting code for that particular use case. I will probably add functions to know how many lines an ast node uses, so one could replace a line range instead of the whole file.

This started while exploring ways to allow credo to autofix some of the issues it finds, the multi alias expansion example derived from that.

Mostly finding what people find most cumbersome or confusing to do, I think the most important thing right now is to start experimenting. There may be some bugs in Code.quoted_to_algebra/2 too, some experiments in that front would be nice as well so we can add more regression tests to core Elixir :slight_smile:

6 Likes

Sourceror is now available on hex.pm and supports Elixir versions down to 1.10 :slight_smile:

https://hexdocs.pm/sourceror/Sourceror.html

6 Likes

@doorgan - thanks for the support down to 1.10!

Here’s a question about Sourceror and Elixir types and doctests…

I expect that a Sourceror transformation could modify a function signature or a return type…

Would Sourceror also have the ability to transform typespecs? Or inline documentation? (eg for doctests)

1 Like

It should be possible, it’s information you have in the AST
For example:

iex(4)> Sourceror.parse_string(~S"""
...(4)> @spec foo(String.t()) :: :ok
...(4)> def foo(a \\ 5), do: a + 10
...(4)> """)
{:__block__, [trailing_comments: [], leading_comments: []],
 [
   {:@,
    [
      trailing_comments: [],
      leading_comments: [],
      end_of_expression: [newlines: 1, line: 1],
      line: 1
    ],
    [
      {:spec, [trailing_comments: [], leading_comments: [], line: 1],
       [
         {:"::", [trailing_comments: [], leading_comments: [], line: 1],
          [
            {:foo,
             [
               trailing_comments: [],
               leading_comments: [],
               closing: [line: 1],
               line: 1
             ],
             [
               {{:., [trailing_comments: [], leading_comments: [], line: 1],
                 [
                   {:__aliases__,
                    [trailing_comments: [], leading_comments: [], line: 1],
                    [:String]},
                   :t
                 ]},
                [
                  trailing_comments: [],
                  leading_comments: [],
                  closing: [line: 1],
                  line: 1
                ], []}
             ]},
            {:__block__, [trailing_comments: [], leading_comments: [], line: 1],
             [:ok]}
          ]}
       ]}
    ]},
   {:def, [trailing_comments: [], leading_comments: [], line: 2],
    [
      {:foo,
       [
         trailing_comments: [],
         leading_comments: [],
         closing: [line: 2],
         line: 2
       ],
       [
         {:\\, [trailing_comments: [], leading_comments: [], line: 2],
          [
            {:a, [trailing_comments: [], leading_comments: [], line: 2], nil},
            {:__block__,
             [trailing_comments: [], leading_comments: [], token: "5", line: 2],
             [5]}
          ]}
       ]},
      [
        {{:__block__,
          [
            trailing_comments: [],
            leading_comments: [],
            format: :keyword,
            line: 2
          ], [:do]},
         {:+, [trailing_comments: [], leading_comments: [], line: 2],
          [
            {:a, [trailing_comments: [], leading_comments: [], line: 2], nil},
            {:__block__,
             [trailing_comments: [], leading_comments: [], token: "10", line: 2],
             '\n'}
          ]}}
      ]
    ]}
 ]}

Typespecs are just module attributes, same for doc annotations, so you should be able to get the module attributes associated to a function by walking the tree with Macro.postwalk or Sourceror.postwalk if you’re also doing some transformation. You need to make some assumptions, though, for instance, considering a module attribute that comes before a function definition as being an annotation for such function. For doctests, since you already have access to the docstrings(they’re module attributes), it would be a matter of parsing the docstring looking for any doctest and then run Sourceror functions on it.

I think the real difficulty comes from the fact that you need to change a node that is not a children of the function, and that comes before the function. My first thought is that by postwalking, if you go up and find a function, and want to update it’s typespec, there could be a way to tell the postwalker that an update should be performed in a sibling node, maybe passing a callback and making it reduce over the parent’s children(I do something similar in the multi-alias expansion example). All of this while also applying line corrections.

I believe it is possible because we have all the data we need, but it’s something I need to put some thought on to be able to do it in a reliable and relatively straightforward way, and it would definitely be something worth adding to the library :slight_smile:

3 Likes

Sourceror v0.4.0 was released.

It fixes a couple bugs, makes several improvements to comments line corrections, and adds more functions to work with ranges and positions, like Sourceror.get_range/1 and Sourceror.compare_positions/2.

I also converted the multi alias expansion example into a proper document you can import to Livebook, with step-by-step explanations of the process: sourceror/expand_multi_alias.livemd at main · doorgan/sourceror · GitHub

You can see the full changelog here.

4 Likes

Was thinking in the same direction, but writing a Credo check. A script to auto-sort the deps would be nice (although my editor currently does it also without issues)

1 Like

Sourceror 0.6.0 is out :slight_smile:

This release introduces some breaking changes, as the way comments are handled by the library has been fundamentally changed. In essence, instead of requiring the user to calculate how line numbers should be shifted, Sourceror tries to “fix” the line numbers in a way that makes sense for the Elixir formatter when you call Sourceror.to_string/2 or Sourceror.extract_comments/2.

To illustrate this, the dependency sorting example can now be reduced to this traversal:

Macro.postwalk(fn
  {:defp, meta, [{:deps, _, _} = fun, body]} ->
    [{{_, _, [:do]}, block_ast}] = body
    {:__block__, block_meta, [deps]} = block_ast

    deps =
      Enum.sort_by(deps, fn {:__block__, _, [{{_, _, [name]}, _}]} ->
        Atom.to_string(name)
      end)

    {:defp, meta, [fun, [do: {:__block__, block_meta, [deps]}]]}

  quoted ->
    quoted
end)

Note that, because the line number correction hack is no longer needed, traversals over Sourceror’s AST is the same as traversals over regular AST, just don’t discard comments metadata :slight_smile:

The multi alias expansion livebook was further simplified thanks to this change.

Also, now Sourceror.get_range/1 returns the actual start and end positions for any node(provided it has line and column metadata), making it suitable for things like “replace the contents between these two positions with this new content”. Column offsets are counted as UTF-8 offsets(like the regular Elixir AST), so tools that want to support the Language Server Protocol need to convert them to UTF-16 offsets.

Changelog:

1. Enhancements

  • [Sourceror] - to_string no longer requires line number corrections to produce properly formatted code.
  • [Sourceror] - Added prewalk/2 and prewalk/3.
  • [Sourceror] - parse_string won’t warn on unnecesary quotes.
  • [Sourceror.TraversalState] - Sourceror.PostwalkState was renamed to Sourceror.TraversalState to make it more generic for other kinds of traversals.

2. Removals

  • [Sourceror] - get_line_span was removed in favor of using get_range and calculating the difference from the range start and end lines.
  • [Sourceror.TraversalState] - line_correction field was removed as it is no longer needed.

3. Bug fixes

  • [Sourceror] - get_range now properly returns ranges that map a node to it’s actual start and end positions in the original source code.
4 Likes

Sourceror 0.7.0 is out :slight_smile:

This release adds a zipper API to improve the ergonomics of navigating and modifying the Elixir AST at will.

I added an introduction livebook to zippers: sourceror/zippers.livemd at main · doorgan/sourceror · GitHub

With this API, removing nodes or adding siblings is a straghtforward task in contrast with Macro.postwalk/Macro.prewalk. The livebook for the multi alias expansion demo was also updated with a new chapter that uses the zipper api to simplify the code.

It’s worth noting that it’s not a Sourceror specific implementation, it works for any Elixir AST. If you find yourself in the need of a zipper for your macros, Sourceror can help there too :slightly_smiling_face:

Changelog:

1. Enhancements

  • [Sourceror.Zipper] - Added a Zipper implementation for the Elixir AST based
    on Huet’s paper.
6 Likes

Sourceror 0.8.0 is out :slight_smile:

This one is a bit small, just bug fixes and the addition of Sourceror.patch_string/2 which allows you to modify just some parts of a string, instead of having to print the whole tree. It receives the original string and a list of patches to be applied.

A patch is just a map with a :range pointing to the start and end positions to be replaced, and a :change that can be either a string, in which case the range will be replaced with it, or a function that accepts the original code in that range, and returns the string that replaces it.

This allows you to perform more fine grained changes to the source code, as you can use Sourceror.get_range/1 in combination with a traversal to generate the patches.

Also, now Sourceror.to_string/2 can receive a format: :splicing option that makes it easier to print elements of a keyword list without having to string the brackets yourself.

To illustrate these changes, here’s how a transformation for the Surface converter that renames the slot props: ... to args: ... would look like (from a discussion in the issue tracker):

Mix.install([{:sourceror, "~> 0.8"}])

code = """
defmodule Card do
  use Surface.Component

  slot footer
  slot header, props: [:item]
  slot default, required: true, props: [:item]
end
"""

{_, patches} =
  code
  |> Sourceror.parse_string!()
  |> Sourceror.postwalk([], fn
    {:slot, _, args} = quoted, state ->
      opts_node = Enum.at(args, 1, [])
      props_node = Enum.find(opts_node, &match?({{:__block__, _, [:props]}, _}, &1))

      if props_node do
        range = Sourceror.get_range(props_node)
        {{:__block__, meta, [:props]}, body} = props_node
        args_node = {{:__block__, meta, [:args]}, body}
        new_code = Sourceror.to_string([args_node], format: :splicing)
        patch = %{change: new_code, range: range}

        {quoted, %{state | acc: [patch | state.acc]}}
      else
        {quoted, state}
      end

    quoted, state ->
      {quoted, state}
  end)

code
|> Sourceror.patch_string(patches)
|> IO.puts

The next versions will be focused on exploring ways to make it easier to create the patches, and to make the apis more stable :slight_smile:

Changelog:

1. Enhancements

  • [Sourceror] Added Sourceror.patch_string/2
  • [Sourceror] Added the format: :splicing option to Sourceror.to_string/2

2. Bug fixes

  • [Sourceror] Now Sourceror.to_string/2 won’t produce invalid Elixir code when a keyword list element is at the beginning of a non-keyword list.
  • [Sourceror] Now Sourceror.get_range/1 will take the leading comments into account when calculating the range.
6 Likes

I finally got some time work on Sourceror, and there have been a bunch of bug fixes, and a new 0.9 version! Thanks to the folks that helped to test and iron out issues in both my comments syncer and Elixir 1.13 release candidate :slight_smile:

There are some slight changes to the API, and more functionality is exposed. Also, the notebooks are now available as guide pages in hexdocs.

v0.9.0

1. Enhancements

  • [Sourceror] to_string/2 now supports options for Code.quoted_to_algebra, like locals_without_parens
  • [Sourceror] get_range/2 no longer considers comments when calculating the range. This can be enabled by passing the include_comments: true option
  • [Sourceror.Patch] Introduced Sourceror.Patch with utilities to generate patches for the most common rewriting operations
  • [Sourceror.Identifier] Sourceror.Identifier is now public

v0.8.x summary of bug fixes

  • [Sourceror] Fixed comment spacing on binary operators
  • [Sourceror] Take comment end of line counts into account to preserve spacing
  • [Sourceror] Fixed an issue that caused comments in lists to be misplaced
  • [Sourceror] Fixed issues that caused comments to be misplaced.
  • [Sourceror] Updated internal normalizer to match latest Elixir 1.13 version.
  • [Sourceror] Fixed an issue that caused newlines to be wrongly removed.
  • [Sourceror] Fixed an issue that caused comments in pipelines to be misplaced.
  • [Sourceror] Fixed issue that prevented keyword lists from preserving their
    original format in tuples.
  • [Sourceror] get_range/1 now properly handles naked AST lists, like the ones
    coming from partial keyword lists, or stabs like a -> b.
  • [Sourceror] get_range/1 now handles partial keyword list syntax instead of
    crashing.
  • [Sourceror.Zipper] down/1 now correctly uses nil as the right siblings if
    the branch node has a single child.
  • [Sourceror] Sourceror.get_range/1 now correctly calculates the range when
    there is a comment in the same line as the node.
7 Likes