Find out unbound variables in the AST, or How I Made Drab Better But Slower, or I Need A Community Opinion on the Approach

grych · April 27, 2018, 10:08am

Dear Alchemists,
Assume I have an AST fragment with some unbound variables:

iex(1)> expr = quote do: IO.puts(x)
{{:., [], [{:__aliases__, [alias: false], [:IO]}, :puts]}, [],
 [{:x, [], Elixir}]}

what I want to archive is to find all unbound variables there, so have something like:

iex> unbound_vars(expr)
[:x]

or at least find out if there are any:

iex> has_unbound_vars(expr)
true

The easiest thing would be to eval the expression and catch the CompileError:

iex(2)> Code.eval_quoted expr
warning: variable "x" does not exist and is being expanded to "x()", please use parentheses to remove the ambiguity or change the variable name
  nofile:1

** (CompileError) nofile:1: undefined function x/0
    (elixir) lib/code.ex:493: Code.eval_quoted/3

But it does not look very idiomatic, and also generates a warning, which is a big deal for me, as this will be used in a library (drab), so the warning may confuse the library users as it will appear during the normal usage.

I am thinking about Macro.prewalk:

collect all variables from {atom, _, Elixir}
collect pattern matches {:=, _, [left, _]} and take all variables ({atom, _, Elixir}) from left
compare those lists

But I am not sure it is enough, I have a feeling I am missing something. Is the pattern match the only way to bind a variable? Are the variables always are like {atom, _, Elixir}?

Background, or I Need Your Opinion On The Approach

This is not an academic issue In Drab, I collect the expressions from the template, which contain Phoenix assigns (<%= function(@assign1)%>) during the compile time, and re-evaluate them in case user wants to change it on the page, live, with poke socket, assign1: new_value.

It works well in the most cases, but crashes when using local variables defined outside of the block:

<%= for u <- @users do %>
  <%= if u != @user do %>
    <%= u %> <br>
  <% end %>
<% end %>

All works OK when you re-evaluate the for expression with poke socket, users: [...]. But when trying to poke socket, user: ... system tries to re-evaluate only the if expression and it finished with CompileError.
This may be worked around with poking both assigns in the same time with poke users: [...], user: ... (in this case Drab knows that there is no point to refresh if, as the whole block will be re-evaluated), but it is not very convenient. And confusing the users. And does not solve all the cases.

I’ve developed the alternative, much easier approach, to archive the same: instead of evaluating this partial expressions, on each poke Drab renders the whole Phoenix template (using update assigns), parses the html and pushes the fragment to the browser. This is much better from compatibility point of view, but way slower. For example, on Drab demo page, which is already using this approach, all pokes are now about 50ms slower than it was before. You can feel it in the counter example.

So, in my opinion, the best would be to combine those two approaches. When the expression has some unbound variables (we can check it in the compile time for better runtime performance), Drab would render the whole template. Otherwise, use the faster way.

Or maybe I’m wrong, and 50ms overhead of rendering template on each poke is not a big deal? In this case I could remove probably about 50% of Drab.Live.EExEngine code, making it way more stable and reliable.

The second approach simply removes all Drab.Live limitations. This is why I really like it. But the first is faster. HELP!

Oh No! Another Approach!

BTW while writing this long (sorry!) post, I realized that I could use the custom EEx marker to let the developer choose the approach! So there would be three markers:

<%= expression %> would work with safe, slow but compatible way - re-render the whole template
<%| expression %> the fast way
<%/ expression %> do not drab this expression

Thus, when you know what you’re doing, you may choose the faster way, but by default it would just work without any limitations. Drab is made also for beginners, and I want it to me the most intuitive as it is possible. But if you know how it works, you may try to choose better performance and write your code differently.

OvermindDL1 · April 27, 2018, 3:02pm

Well a binding is always of the form {atom(), list(), atom()} so perhaps just scanning for that would be enough. As for if they are ‘unbound’, well that would require holding state of all kinds of matchers and bindings as it recurses down, which is much easier if you eval all intermediate macro’s first so you get down to a base set of matchers (in a function head, in a case head, or on the left of a = match call I ‘think’ would be everything inside a template that could appear after all macro expansions, er, well maybe for and with handling too because those are implemented as irritating special forms that don’t lower to anything at macro expansion even though they should… so maybe after all expansion then also handle all <- too?).

It is entirely possible to do what you want on the fast approach though, but there will be a lot of special cases and conditions due to a few inconsistent aspects of elixir’s ast representations, but those can always be fixed as someone encounters them…

The ‘slow’ method is definitely a nice fallback as well, especially if different eex markers choose different methods. Don’t forget though that if using little components of drab’s then that significantly reduces the entire scope that would be to be re-rendered using the slow method, which might just make it fast enough anyway (except in larger sections, but then drab components inside drab components would speed that up then as well)…

I really like the choosing different methods based on the eex marker that you proposed, the default = would just always work, if slower, but there are ways to make parts faster as the user deems useful and tests themselves.

grych · April 27, 2018, 3:28pm

This is true. If the app is split to partials, re-rendering them should not be much longer than rendering parts of the partials! After this change tests are finishing in about 133 seconds, previously was about 128, so the difference is not so huge.
I was judging after the demo page behaviour, where you can feel the performance decrease. But this page is bad, big and ugly. Well, maybe it is good it’s so big, so I can see the worse case scenarios

OvermindDL1 · April 27, 2018, 3:48pm

You should rewrite the demo page to make use of lots of little partials/components instead! ^.^
Or at least as an alternative implementation of it (two identical demo pages but built internally differently).

grych · April 27, 2018, 6:05pm

Yeah, I should, not only because of this reason I wish I had time for this.

grych · April 27, 2018, 6:15pm

As usual, writing to this forum helped me clear my mind, and I’ve decided to go with the slow, but limitless approach.

And for now, there will not be a choice (with marker) between those two ways of evaluating changes. Discussions on the last ElixirconfEU convinced me to push to 1.0 instead of implementing tons of new features, so I’ve decided to remove this overcomplicated chunk of code, because of the time cost of maintaining it.

I might go back to this in the future, but for now the priority is to remove all Drab.Live limitations, even with the cost of performance.

It is hard. Deleting stuff you’ve invented during the sleepless nights,

OvermindDL1 · April 27, 2018, 6:24pm

It is but it is also the ever-forward march of coding.

Besides, you can encourage more component usage now for efficiency reasons.

grych · May 1, 2018, 7:18pm

I haven’t done that yet, but at least added one more example: the infamous counter moved into the partial. You may see the performance difference, and also this is a small tutorial how to use partials.

https://tg.pl/drab#partials

christhekeele · May 3, 2018, 11:05pm

Variables in AST always take the form {name, meta, context} when is_atom(name) and is_list(meta) and is_atom(context); furthermore they are unique in that shape so the guard above will only ever catch actual variables.

If you take this approach you will find yourself re-implementing a pattern-matching lexer trying to determine which variables are bound or unbound; remember matches can be complicated and do deep destructuring.

What you really want to do is pass your code through the Elixir compiler without evaling anything. You can totally do this since the Elixir compiler is written in erlang so we can call into its AST expansion functions. This approach is far superior because it will also take into account things like in-scope variables, aliases, and imports in the provided Macro.Env (probably the __CALLER__ in your macro); the downside is while you will be consulting the single source of truth about Elixir’s variable binding, you will be calling into a somewhat private API not meant for use outside of Elixir core which is liable to change version to version.

If you want to get started down that path, I’d check out the Kernel.Utils.defguard implementation which does something similar: extracts variable references from the AST and inserts them into the ENV, so it can expand the code and get an updated env without hitting a CompileError. I imagine if you manipulate the env just right pre-expansion you can introspect on the returned post-expansion env to deduce which variables are still unbound.

qhwa · January 5, 2022, 9:36am

I had the same question and wrote a function for it.

github.com

qhwa/formular/blob/master/lib/formular.ex#L433-L453

    
      
            @doc """
            Returns used variables in the code. This can be helpful if
            you intend to build some UI based on the variables, or to
            validate if the code is using variables within the allowed
            list.
          
          
  ## Example
          
          
  ```elixir
            iex> code = "f.(a + b)"
            ...> Formular.used_vars(code)
            [:a, :b, :f]
            ```
            """
            @spec used_vars(code()) :: [atom()]
            def used_vars(code) when is_binary(code),
              do: code |> Code.string_to_quoted!() |> used_vars()
          
          
  def used_vars(code),
              do: Formular.Compiler.extract_vars(code)
          end

:elixir_* modules are cool and inspiring but didn’t contribute directly to my solution. I end up with traversing the AST tree like Kernel.defguard does.

josevalim · January 5, 2022, 10:30am

I want to add that this is actually hard to do correctly in practice. You would effectively need to implement a large chunk of the compiler. For example, take this code:

foo(a + b)

How many variables are used?

Well, if “foo” is a macro, then foo may define several variables in it. You would have to expand all macros recursively to give the correct answer.

For example, what about this code?

destructure [foo, bar], [1]

This is valid Elixir code today and you know that foo and bar are variables only by expanding the macro and then traversing all of Elixir special forms and reimplementing how they handle variables. The reason it works for defguard is because guards are a very small subset of the language. So reimplementing it is trivial.

qhwa · January 6, 2022, 8:01am

Hi, José thanks for replying.

About the macros, can a Macro.expand before traversing solve the problem?
Yes, I agree it is difficult to be and stay correct. In your opinion, what could the ideal approach be like?

Thank you in advance!

josevalim · January 6, 2022, 10:46am

Macro.expand is not going to cover macros that are added in the snippet being expanded, for example. So it can help but there would still be edge cases!

qhwa · January 7, 2022, 12:41am

Right, macros defined in snippets are not supported. Fortunately, I can ignore that in my library because all macros are supposed only to be introduced via a helper module outside of the snippet.

Also, I thought about another approach of taking advantage of the Elixir compiler to check unbound variables. It works by injecting binding variables until the compiler stops raising such errors. I created a poc for it. The code looks like this:

def extract_vars(ast, env \\ %Macro.Env{}, vars \\ MapSet.new())

def extract_vars(ast, env, vars) do
  try do
    Module.create(
      @temp_module,
      def_run(ast, vars),
      env
    )

    MapSet.to_list(vars)
  rescue
    err in CompileError ->
      with %CompileError{description: err_msg} <- err,
           {:ok, var} <- unbound_var_from_err_msg(err_msg),
           false <- MapSet.member?(vars, var) do
        extract_vars(ast, env, MapSet.put(vars, var))
      else
        _ -> reraise(err, __STACKTRACE__)
      end
  end
end

# It defines a function `run(binding)`, looking like this:
#
# def run(binding) do
#   a = Keyword.fetch!(binding, :a) # <- injected variable definition
#   b = Keyword.fetch!(binding, :b) # <- injected variable definition
#   ...
#   # here goes the snippet
#   # ...
# end
defp def_run(ast, args) do
  quote do
    def run(binding) do
      unquote(def_args(args))
      unquote(ast)
    end
  end
end

defp def_args(args) do
  for arg <- args do
    quote do
      unquote(Macro.var(arg, __MODULE__)) = Keyword.fetch!(binding, unquote(arg))
    end
  end
end

defp unbound_var_from_err_msg(err_msg) do
  reg = ~r/undefined (function|variable) \^?(?<var>\w+)/

  case Regex.named_captures(reg, err_msg) do
    %{"var" => name} ->
      {:ok, String.to_atom(name)}

    _ ->
      :error
  end
end

Basically, it works with some small issues such as concurrent compiling, which is not difficult to fix IMO.
However, I don’t know if this is a good idea because it feels hacky .