AST traverse API: important for Elixir source code tooling

suggestion
#1

Hey, lots of developers already asked about this one in #questions-help category. I have also though about it and I would like to describe a serious problem in Elixir tools creating process. In some cases like writing code helpers it’s extremely important to have a single, well documented and easy way to traverse AST without losing comments, formatting and code integrity. I have lots of ideas Elixir tools.

One of simplest examples could be Phoenix 1.2.x -> Phoenix 1.3.x migration tool. Imagine that you can easily traverse AST and replace/update it in a way which is much more faster than doing it manually and will not fail at least up to Elixir release 2.0.0. Also such tools well written does not allow to make a stupid mistake when changing lots of module names. Simple typo or just forgot to change module name may be really confusing and frustrating.

The problem is that we have not enough easy and well prepared API to do that. Of course other people can say that we can use grep and just write BASH scripts in general for that use case. Firstly good luck with that. :smiley: Secondly: do we really need to use such a nightmare (just take a look at associative arrays syntax in BASH) instead of our favorite language? Of course we can also write and/or write grep-like helper functions in Elixir, but I don’t really feel that’s a good practice to do that.

I would like to remind one issue I have reported and fixed in one simple tool written in Elixir:

It’s just how String.contains/2 and regular expressions (i.e. grep-like way) is working in practice. One change in template could broke whole tool. I’m really thinking if such way is not even worse than using Private API. Here not only code could change, but even it’s format, number of lines (depends how we are trying to workaround this problem) etc.

I would also remind everyone about something. First scenic apps and even games (!) are written right now! It’s only a matter of time when really useful solutions would be created. One of first bigger of them would definitely happen even around Nerves topic. Having that in mind think that we still can’t fetch (in nice way) project details (from mix.exs) to display in basic 0.0.1 project manager (which I would like to create sooner or later). Does somebody remember plans for Elixir code editor in scenic? I would like to not see Regex (if I remember correctly) based syntax highlighter like we have on our forum.

I believe that adding such way to Elixir core is best solution, because I don’t believe that random developers (like me) would be able to create their own AST traverse up-to-date libraries having in mind full Elixir specification (which is not even documented yet, right?) just to write tool which simply parses few files.

Please let me know what do you think about this idea. I believe that at least some people would be interested in this topic, so I would like to ping them and share their ideas how they would see such API in Elixir:

@chvanikoff, @OvermindDL1 @smorin and @tmbb

Related topics:




2 Likes
#2

This seems to be the primary thing you’re arguing for:

Comments in the AST are likely to be approved as long as someone writes up a proposal for them: https://groups.google.com/forum/#!topic/elixir-lang-core/GM0yM5Su1Zc

I suspect maintaining formatting, aside from what the Elixir formatter already does would probably be rejected. I could be wrong. However, I personally see that as more of a nice-to-have than a requirement for what you’re talking about.

Could you elaborate on what you would expect in terms of maintaining code integrity?

As far as traversing the AST goes, there is already Macro.prewalk and Macro.postwalk. What are you expecting in addition to these and why do they need to be part of the language rather than a library?

You’ve spoken a lot about the motivation for such features, which is great. Could you elaborate on what things need to be implemented in Elixir and how such a thing might be implemented?

2 Likes
#3

Yeah, this could be alternative solution. However I have created proposal for current Elixir as simple API enhancement, so it could be included even in 1.7.5.

Yeah, but it’s only part. It’s like you have calculator with all advanced functions, but without sum. Sure, I could simply forgot about comments, but I believe that more people would use x tool if it would do only what it’s designed for and not for removing comments as a side effect. Think that every mix task could change Elixir version requirement. That’s crazy assumption and has no real usage.

Don’t even start work, but just think how you would solve such problem. As in example think how you would create a tool to automatically migrate from Phoenix 1.2 to Phoenix 1.3. You would see how big problem is code integrity when you are creating huge regular expressions.

Something like mix format implementation, but without algebra + maybe some helpful functions. In short imagine that you call Code.format_string/2 and your are passing a callback in which you would have whole AST and need to return also AST in valid form. With some helpful functions like module_lookup/1 and function_lookup it would be extremely powerful and easy to use for newbies. Look that most of Elixir formatter code is in Private API and it’s not easy for newbies to rewrite it for their needs and maintain for years tools which should not took more than typical helpful tool written in weekend. Also yeah, Macro.prewalk and Maro.postwalk would be definitely used in lots of tools working with AST. We could have even 100 functions like that - if we can’t preserve comments and perform file save then we can’t do more than writing compile-time macros.

I’m not an Elixir-Core-Team member, so I don’t believe that I would propose a code which would be accepted as is. I could only say what things are helpful. I could give even an simple (not real) example:

defmodule MyApp.ProjectManager do
  def get_project_info(path_to_project) do
    path_to_project
    |> Path.join("mix.exs")
    |> File.read!()
    |> Code.traverse_ast(&do_get_project_info/1, preserve_comments: false)
  end

  defp do_get_project_info(ast) do
    ast
    |> Macro.lookup_module() # ok, here we could use Macro.postwalk
    |> get_only_module()
    |> Module.lookup_function({:project, 0}) # same here
    |> get_function_return()
  end

  defp get_only_module([module_ast]), do: module_ast

  defp get_function_return({_, _, data} = _function_ast), do: List.last(data)
end

Of course this code is written by hand without even errors checks, handling edge-cases (like private function calls) etc. I just wanted to show how I would like to work with it and not where and how it should be implemented. I believe that @OvermindDL1 could help more with implementation. He even worked on his own version of defguard, so I believe at this point he have much more experience than me.

Such example Code.traverse_ast/3 with preserving comments could be again converted into String, formatted and written on disc. I believe that soon somebody would create a helpful library which would simplify it for example by giving much more lookup_*-like functions.

I just want to point something. It’s not only a matter of adding single dependency as a first and last tool. As I give example with migration tool I have much more worse and better ideas for such API usage and I believe that other people would share their own tools proposals.

Hope I explained everything enough.

2 Likes
#4

Yeah I think this could be more easily created once a proper spec is developed.

As for storing non-code things like comments, the way that a lot of compilers do nowadays is one of two things:

  • Comment Node: Same as any other node, for elixir this would be something like {:comment, [], "comment text"} or whatever, this means you have to ignore such things though (but since the ‘argument’ is a binary and not a list then at least it’s not ambiguous, I think…), this is popular among, say, C++ compilers (although there you have iterators that can automatically skip nodes that you don’t care about, like comment nodes)
  • Metadata: Where you attach the comment to an actual data node via either prefix or postfix comments (this is like the OCaml model where comments become ‘attributes’ on data nodes). So something like this:
# A
2 + # B
2 # C
# D
# E

Would turn into something like:

{:+, [context: Elixir, import: Kernel, comments: {" A\n B", " D\n E"}], [
  {:__block__, [comments: {"", "B"}], [2]},
  {:__block__, [comments: {"", "C"}], [2]}
]}

Or something like that (not wedded to the form at all, just a quick example).

#5

Yeah, but keep in mind how long we would need to wait for such change. I don’t believe that it would be like typical issue / pull request discussion. Except worst cases (like extremely long process of creating CSS3 spec) there is lots of things to do. What if 2 BEAM languages would have their own solutions? If we would have already solution for that then later changes (after creating comments spec) should not be so big. Just take a look at Code.format_string!/2 implementation. The only big change would be to remove extra code for comments which would be handled as same as rest is already. In short think how long developers would need to wait for such specification before they are able to create powerful small tools. I’m not sure if waiting months for API which would allow to write much simpler tools in about week (or even weekend) is a best idea. Of course I don’t know how long it could take, but I don’t expect it in “next” (not literally) Elixir release.

1 Like
#6

Think the real issue is someone putting the spec together. If that’s done with the right approach think the community would accept it after proper feedback and iterations

1 Like
#7

I think that at this point the Elixir AST should be considered public API. I don’t think it’s wise to add comment nodes right now.

However…

If I were to rewrite the Elixir AST format, I’d do the following:

  1. Everything would become 3-tuples, even literals like numbers, strings or atoms. This would make aome macros harder to write, but ultimately it would preserve location information, which is very useful

  2. Location data would include a line number, a byte position for the start of the expression and a byte location for the end of the expression. That would make manipulating source code much easier

3 Likes