Benchmark : String templating at runtime

creusefond · March 26, 2021, 8:42pm

Hi all,

I am facing the issue of having large template strings containing an unknown number of variables, and those templates need to be bound thousands of times.
Due to performance constraints, I would like for this procedure to be optimized. I had multiple ways of doing it.

Threads such as this one suggest to use EEx. I saw that I could use EEx.compile_string to get a quoted expression, that I can resolve with Code.eval_quoted (on the benchmark : eex). I also benchmarked a direct call to EEx.eval_string (eex_eval_string).

The previous implementation of this was to simply parse the content of the string for each variable, using a Regex. I added that method to the benchmark (base).

I also tried a manual implementation. The template is parsed once, and broken down into parts. During the binding, I simply append each variable value and each part (manual).

You can see the implementation here, the benchmark here and the results here.

Conclusions :

Manual implementation outperforms the rest, by a wide margin.
EEx seriously underperforms for this task (40x-170x slower than manual). Maybe I’m not using it right.
Creating this benchmark was incredibly simple. Kudos to Benchee.

Please do not hesitate if you have feedback, or if you’d like to add another method to the test.

axelson · March 26, 2021, 8:46pm

Not a direct answer, but have you looked at using iolists for this? Here’s a nice article that touches on the performance benefits of iolists:

The hard part would probably be parsing the “template” into something you can use to build an iolist. Also if you gave a few examples of the types of input you expect that would probably help people give you suggestions.

tj0 · March 26, 2021, 9:10pm

Super interesting result. (btw, I think you want elixir-bechmarks/string_templating.txt at main · delight-data/elixir-bechmarks · GitHub as the link for the results).

Have you tried using pure eex templates and passing a Map with the variables? I’m not sure about the exact eex implementation, but my assumption is the templates are only compiled once, which means on a Map, you would just pay the price of the map lookup of the variable on each render.

Qqwy · March 27, 2021, 12:33am

The way you were using EEx indeed does not result in very good performance: It re-compiles the EEx template on every benchmarking iteration.

I’ve sent you a PR which contains an implementation using EEx’s function_from_string to compile the EEx template once and then use it during the benchmark run.

Running the benchmark on my computer shows the following results:

$ mix run benchmarks/string_templating.ex 
Compiling 1 file (.ex)
"ALL THE FOLLOWING SHOULD BE TRUE"
true
true
true
true
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Number of Available Cores: 8
Available memory: 7.60 GB
Elixir 1.10.2
Erlang 22.3.4.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking base...
Benchmarking eex...
Benchmarking eex_compiled...
Benchmarking eex_eval_string...
Benchmarking manual...

Name                      ips        average  deviation         median         99th %
eex_compiled           758.61        1.32 ms    ±20.23%        1.27 ms        2.41 ms
manual                 125.65        7.96 ms    ±14.40%        7.89 ms       11.80 ms
base                    17.07       58.57 ms     ±5.70%       58.51 ms       65.98 ms
eex                      2.22      450.36 ms     ±5.85%      442.40 ms      500.30 ms
eex_eval_string          1.42      704.79 ms     ±9.19%      714.84 ms      787.16 ms

Comparison: 
eex_compiled           758.61
manual                 125.65 - 6.04x slower +6.64 ms
base                    17.07 - 44.43x slower +57.25 ms
eex                      2.22 - 341.65x slower +449.04 ms
eex_eval_string          1.42 - 534.66x slower +703.47 ms
warning: redefining module EexExampleModule (current version defined in memory)
  lib/templating_benchmarks.ex:27

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Number of Available Cores: 8
Available memory: 7.60 GB
Elixir 1.10.2
Erlang 22.3.4.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking base...
Benchmarking eex...
Benchmarking eex_compiled...
Benchmarking eex_eval_string...
Benchmarking manual...

Name                      ips        average  deviation         median         99th %
eex_compiled             5.23        0.191 s    ±41.72%        0.175 s         0.47 s
manual                   2.43         0.41 s     ±7.35%         0.41 s         0.47 s
base                    0.199         5.02 s     ±0.00%         5.02 s         5.02 s
eex                    0.0213        47.00 s     ±0.00%        47.00 s        47.00 s
eex_eval_string        0.0142        70.61 s     ±0.00%        70.61 s        70.61 s

Comparison: 
eex_compiled             5.23
manual                   2.43 - 2.15x slower +0.22 s
base                    0.199 - 26.24x slower +4.83 s
eex                    0.0213 - 245.76x slower +46.81 s
eex_eval_string        0.0142 - 369.18x slower +70.42 s

creusefond · March 27, 2021, 11:06am

This looks indeed promising, since the manual approach does a lot of string concatenation.

Well I mainly made this post to share some general results, and not necessarly to focus on my use case, but I appreciate a lot that you want to help!
I am refactoring an email sender, that is used for mass email communications (up to 200k receivers for one campaign). I have a user-defined email template, and I want to insert some receiver-specific data in it for each of the thousands of receivers (such as the unsubscribe link).

Thank you for your interest Sorry I messed up the link, but I can’t edit it due to new account restrictions.

By “pure eex templates”, do you mean using for instance function_from_string/5 ? That’s the approach of Qqwy, and it is indeed more optimised. Neat!

Thank you so much for your contribution
It seems we have a small difference in our benchmarks : I included compilation, and you excluded it. I tested your method with a recompilation at each benchmark iteration : it’s still much faster than the version of EEx that I originally used, and it’s about as fast as the manual method. Great!

For fairness, I will split this benchmark into two : with template compilation and without. I will also try to use IO lists. I’ll post the results here as soon as I 'm done.

benwilson512 · March 27, 2021, 3:05pm

I’m not sure that the with compilation version makes a lot of sense to benchmark. You aren’t including the time to compile the Elixir files for the manual version right? And given that you can make EEX compile when your elixir code compiles, the apples to apples comparison is without compilation for both.

creusefond · March 27, 2021, 5:30pm

I misused the term “compilation” in that context. I meant “preparing the template that is provided at runtime”. Sorry for the misunderstanding.
I added a more thorough explanation of what is benchmarked in the README of the project. I hope it clarifies everything.

I updated the code and the results of the benchmarks accordingly. Basically, when preparation time is considered, manual is faster, but otherwise compiled EEx is the top dog.

Next on my todo : understanding and using IO lists.

tj0 · March 27, 2021, 6:03pm

Almost what qqwy suggested, but more along what benwilson512 implied.

The important part I’m guessing is how fast the template renders to the user. How the data gets to the user is the question. In some benchmarks, you are compiling the string to bind multiple variables. My suggestion is to just use a standard eex template and pass a Map.

My guess it will be around the same performance as the “pre-compiled eex”.

benwilson512 · March 27, 2021, 6:47pm

Gotcha. Yeah the use case matters a lot I guess. EEx isn’t really built to have new templates added at runtime, so it won’t perform very well in such a case. Maybe GitHub - edgurgel/solid: Liquid template engine in Elixir might be a better fit, which still has a parse step but it seems to be one that you could easily cache and reuse at runtime.

josevalim · March 28, 2021, 1:26pm

It is worth adding that there are improvements in Erlang/OTP 24 and Elixir master in regards to code evaluation with many vars which should improve it about 5-6x at least.