Imagine a conductor forced to relearn musical notation every time a new instrument joins the orchestra. Violin—one system of writing. Cello—another. Oboe—a third, with reversed polarity, no less. The trumpet flatly refuses to acknowledge the existence of the staff and insists on a tablature of its own invention. Absurd, of course. In the real world every instrument reads the same score. Notes, rhythm, dynamics are universal. Only the technique of execution differs.
MetaAST is that score.
A bit pretentious writing style but I enjoyed the article.
Extremely ambitious project that I’d join and would try to help. I do agree that each language’s AST is kind of a distraction and if we can standardize a converter from that to the meta-AST then we’d score huge wins in code analysis. Which we desperately need recently.
One problem is that many languages don’t have a stable AST – Elixir is one of them. However, combining the language and its version allows us to pin down exact AST details. Requires more work – but it’s only done once.
Really though: amazing idea. Would love to see it bear some fruit.
Strange post.
First of all, there are no concepts which exist in all languages. Even variables don’t exist in all languages (like stack-oriented languages or assembly languages).
Second, is that purity is a higher-level construct and what pure and what is not — defined strictly by the developer. In a language like C, setting data by some local pointer can be a side effect, given that it can be used as a way to share some information with another thread. And even if you define what is pure and what is not (what will most likely be different from the developer’s understand), you can’t just write a generic analyzer, because not all languages have functions, and those who have them, usually can’t provide a guarantee that all code reachable by the function will be a part of the call graph.
Even in Elixir, this code is pure
def add_five(x) do
x + 5
end
But so is this
def add_five(x) do
ref = make_ref()
caller = self()
spawn_link(fn -> send(caller, {ref, x + 5}) end)
receive do
{^ref, result} -> result
end
end
And the fn inside is invisible in the call graph. You may argue that the latter version changes the state of the system by introducing a new process. But I will say that the first version changes the current_stacktrace field of the process info. So, again, developer defines what “pure” means.
I mean, you can represent different languages in the similarly structured AST, but this would just offload semantic differences of the languages on the programs which would analyze such AST. There would be no universal algorithms, just universal representation with lots of language-specific algorithms around it
Exaggeration is my middle name. That’s the way I hook not only the reader to continue reading but also myself to not give up writing.
Well, many languages don’t have a notion of AST in the language itself. I use 3-rd party S-expr generators for Ruby, for example. And yet that’s not the biggest problem, trust me ![]()
It’s coming. I wrote it when I figured out it finally got somewhat shaped. Stay tuned.
In the Yakut language, there are 14 words for snow of varying degrees of whiteness. In Japanese, 裸木 (Hadakagi)—is a seasonal word that means a tree that has shed its leaves for the winter.
And yet we can talk to Yakuts and Japanese.
Sure I can. It won’t be bullet-proof, it would make mistakes, and I am fine with that. Everything “unclear” goes to “non-pure” and I get back the list of definitely pure functions. That’s ok for my needs.
May I argue that I would consider the latter function being a non-pure one, and that’s totally fine? False-positives are harmless, unless, of course, you are after an academic thesis that nobody would even read.
May I allow `:language_specific` nodes in my AST and support plugins dealing with the transformation?
Not so lots in the first place. Disjunction of the languages would require some additional effort, but conjunction is already a very valuable asset.
This thing cannot be generalized flawlessly to the universe is the judgement that buried a ton of great solutions six feet under.
I did the same approach in my elixir compiler called Tria. It turned out that there were very very few pure function (and by “pure” I mean those that can be evaluated at compile time). I ended up implementing approach of postponing effect when evaluating code at compile time. But that resulted in code explosion, where a single call would evaluate to 10 or 20 effects.
But I don’t know your final goal, your blog post says nothing about it, so I don’t know if you want to have an AST for all languages or only for a subset, if you want analyzers to be completely universal or somewhat universal, etc.
I would also suggest to take a look at MLIR project. It has a wide adoption, lots of translators and a bunch of algorithms about it, including reachable code analysis which is essentially a “purity” analyzer
Thanks!
That’s a teasing post without spoilers, mostly designed for responses like yours, thanks for that. The answer everywhere is “somewhat.” I am not a perfectionist when it comes to the choice between ideal and working.
Pandoc for programming languages ? I like the idea, the “common core” as other posters said needs to be arbitrarily defined because concepts vary a lot across languages. But that kind of constraint is what makes those endaevours both desperate and interesting - best kind of project if you ask me. Good luck !






















