Dynamic function generation at runtime with Code.eval_... very slow

hansonkd · October 10, 2018, 6:24pm

Hi.

I am making a library for user submitted templates (so users can customize email content). I have a parser which parses the template to an AST, and now I am implementing evaluating the template. My first pass was to interpret the template, going over the AST every time it renders, then my second pass was to build an anonymous function from the AST (so it is recursive anonymous functions for the different branches of the AST), now on my 3rd pass, I am building up an Elixir AST which in turn evaluates to a function.

My code uses quoted expressions, but it is easier to show the problem using strings:

with this string:

{compiledB, _} =
    Code.eval_string("""
        fn vars ->
          %{"my_var" => userVar_0} = vars
          [["(", to_string(userVar_0), ":", to_string(userVar_0), ")"]]
        end
        """)
Benchmark.measure(fn ->
        %{"my_var" => 20}
        |> compiledB.()
        |> IO.iodata_to_binary
end)

Where Benchmark.measure calls the function 1000 times takes 0.025493 seconds, which is slower then interpreting it.

If I change it to do this:

Code.compile_string("""
    defmodule A do
        def render(vars) do
          %{"my_var" => userVar_0} = vars
          [["(", to_string(userVar_0), ":", to_string(userVar_0), ")"]]
        end
    end
    """)
Benchmark.measure(fn ->
        %{"my_var" => 20}
        |> A.render()
        |> IO.iodata_to_binary
end)

Running this 1000 times takes 4.37e-4 seconds

I like eval_string, since it returns an anonymous function right away and the function is isolated from the rest of the execution environment. Using the compile option, I can’t compile it into an anonymous function, only a module, which then gets put into the Elixir environment so I would have to make sure Module names are unique (but not too unique so I don’t run out of atoms). I would also need to purge manually any templates that aren’t used anymore, whereas with eval_string, I assume the garbage collector would handle cleaning them up automatically.

Why is eval_string several orders of magnitude slower? Is using compile_string and then keeping track of module names and purging them when no longer needed the way to go?

Thanks!

dom · October 11, 2018, 12:10am

Are you running these benchmarks directly in IEx?

https://hexdocs.pm/iex/IEx.html#module-expressions-in-iex

mgwidmann · October 11, 2018, 4:03am

Not sure I understand why plain macros won’t do what you need. Macros don’t have any runtime performance penalty since its just compile time expansion into the code you would have written if you did it without a macro. Why do you need to generate any code at runtime, no doubt its slow. Are you receiving these templates at runtime? If so you shouldn’t be passing them into eval as this is super dangerous (string or quoted expression) and if not you should be using macros.

Please explain your use case a bit more…

hansonkd · October 11, 2018, 7:18am

The use case is that they are runtime strings. E.g. the user has defined an email template for their account. It works very similar to the way Jinja templates are built for Python. A domain specific language is passed in, converted, generates native code in the host language, This creates templates that run just as fast as hand written code. It isn’t executing arbitrary input.

However, my question isn’t to debate about the purpose of my program, but I I would like to understand why there is such a stark difference in execution speed.

I somewhat solved my problem by wrapping another anonymous function inside.

Some more interesting examples:

Code.eval_string("""
fn ->
fn vars ->
%{“my_var” => userVar_0} = vars
[["(", to_string(userVar_0), “:”, to_string(userVar_0), “)”]]
end
“”")

josevalim · October 11, 2018, 7:49am

The purpose of the program definitely matters, because there may be simpler solutions available.

When you are evaluating code at runtime, there are at a minimum 4 or 5 compiler passes involved, to build something that is then evaluated (i.e. invoked by hand and not compiled). In your case, it is not clear why you need to use eval in the first place. Couldn’t you return the function?

compiledB =  fn vars ->
  %{"my_var" => userVar_0} = vars
  [["(", to_string(userVar_0), ":", to_string(userVar_0), ")"]]
end

hansonkd · October 11, 2018, 1:00pm

Lets jut say the purpose of my program is to build the fastest runtime template engine and explore the inner workings of Elixir. Essentially I am trying to answer the question, if you are sending out 1k emails, does the benefit of compiling outweigh the overhead?

My work was inspired by this project: http://jinja.pocoo.org which JIT compiles templates into python bytecode at runtime. I understand Python is a completely separate language that is interpreted and not compiled, but I thought it was at least worth exploring whether the same concept could be in Elixir.

You can 100% solve my template problem without compiling. I have written versions that interpret my AST, and another version that recursively walks the AST and builds up anonymous functions. However, once you start to have an DSL that is recursive, the code starts to slow down because you are forced to start nesting the functions. In my personal benchmarks, building up these anonymous functions is about 3 times slower then handwritten code. And becomes slower the more nested it is.

Lets say a user template translates into this AST: {:list, [{:var, "my_var"}, {:var, "other_var"}]}, building a template function without compiling looks something like this:

defmodule Builder do
    def build({:var, varname}) do
        fn vars ->
            Map.get(vars, varname)
        end
    end
    def build({:list, elements}) do
        compiled = Enum.map(elements, &build/1)
        fn vars ->
            Enum.map(compiled, fn f -> f.() end)
        end
    end
end

Now imagine an AST with nested loops, logic, etc. Simply accessing the map for the vars slows you down. Especially if you have vars be a stack that you push new variables onto (like with for loops). Then add in overhead from all the extra function calls, it is several times slower then compiled code.

A compiled version would look like this:

defmodule Builder do
    def compile(body) do
        {ctx, compiledBody} = build(Ctx.new, body)
        # This builds a pattern match on the inputed vars map (while sanitizing the inputs), to extract the values in one place at the top of the function.
        assignments = Ctx.build_assignments(ctx)
        quotedFunction = quote do
            fn unquote({:vars, [], Elixir}) ->
                unquote(assignments)
                unquote(compiledBody)
            end
        end
        compileQuotedFunction(quotedFunction)
    end

    def build(ctx, {:var, varname}) do
        # Limit the number of variable names (so we dont exhaust atoms)
        {newCtx, varName} = Ctx.next_var_atom(ctx, varname)
        {newCtx, {varname, [], Elixir}}
    end
    def build(ctx, {:list, elements}) do
        {newCtx, compiledEls} = Enum.reduce(elements, {ctx, []}, fn el, {accCtx, accCompiled} ->
            {newCtx, compiled} = build(accCtx, el)
            {newCtx, [compiled | accCompiled]}
        end)
        {newCtx, Enum.reverse(compiledEls)}
    end
end

Which will translate {:list, [{:var, "my_var"}, {:var, "other_var"}]} into:

fn vars ->
  %{"my_var" => userVar_0, "other_var" => userVar_1} = vars
  [userVar_0, userVar_1]
end

Compiling eliminates recursive function calls and map access, while admittedly introducing new complications. My goal here was to identify those tradeoffs of a compiled based approach, vs the “semi-compiled” (building up anonymous functions), vs interpreted.

Another pattern that I found to possibly answer my original question is to have a pool of GenServer compilers that compile with unique modnames:

def compileQuoted(funcQuoted) do
    # Todo: Each compiler should have a unique modname to eliminate conflicts
    modname = Compilers.Compiler1
    quotedFunction = quote do
        defmodule unquote(modname) do
            def render() do
                unquote(funcQuoted)
            end
        end
    end
    [{mod, _}] = Code.compile_quoted(quotedFunction)
   mod.render
end

Because render, returns an anonymous function, this solution gives me a fast function, which persists even after the module is unloaded.

josevalim · October 11, 2018, 1:28pm

Right, the compiled approach will always be faster. If you have a general template engine, then the compiled approach will be faster and probably simpler.

However, if your template engine is limited (think mustache/handlebars), then I would precompile only the template into a Template AST and not necessarily compile it down to Elixir. Evaluating this template ast should be reasonably fast as long as you rely on IO lists (i.e. never concatenate strings, only keep them in a list).

I am not sure if this helps but these would be my two cents. I would avoid eval generally though.

hansonkd · October 11, 2018, 2:05pm

Yes, you are probably right that interpreted is probably enough since IO to send the email will most likely be the limiting factor, performance wise.

Just incase anybody reads this in the future, my results on a small template with nested loops.(“compiledFully” uses Code.compile outlined above, “semi-compiled” is traversing the AST and building nested anon functions).

Compiling once then calling render (compile is outside loop)

Name                    ips        average  deviation         median         99th %
compiledFully       32.78 M      0.0305 μs    ±23.34%      0.0290 μs      0.0580 μs
semi-compiled      0.0337 M       29.71 μs    ±33.99%          27 μs          79 μs

Comparison: 
compiledFully       32.78 M
semi-compiled      0.0337 M - 973.78x slower

Compiling every render (compile is inside loop)

Name                    ips        average  deviation         median         99th %
semi-compiled        9.83 K       0.102 ms   ±342.27%      0.0920 ms       0.180 ms
compiledFully       0.105 K        9.50 ms     ±3.93%        9.49 ms       10.42 ms

Comparison: 
semi-compiled        9.83 K
compiledFully       0.105 K - 93.37x slower

So if you expect to use the compiled template more then 10-50x, it is worth it to compile it to Elixir.

michalmuskala · October 19, 2018, 2:49pm

Only code inside modules is compiled, code outside modules is always interpreted through erl_eval. So even when you define the fun inside Code.eval_string you’re not compiling the function - what you receive is a reference to a function that will be interpreted when executed. That’s why threre’s not much performance difference and why code inside a module is so much faster.