PhoenixUndeadView - let's discuss optimization possibilities for something like Phoenix LiveView

Assuming this project is named Vampyre ther will obviously be a ~V so that you can do this:

defmacro my_html_helper(x, y, z) do
  ~V"""
  Somethin something <%= x %>
  Blah blah <%= y + z %>
  """
end

This will be compiled into a ready-to-use Segment.container(), which will optimize properly when inside a vampyre template.

1 Like

Super convenient! As such a usage should be. ^.^

1 Like

I’ve decided I’ll separate the template engine, HTML widgets into the Vampyre package, so that they cane be used in Phoenix. I should probably benachmark it firstc, but it makes sense to benchmark the whole pipeline. Unfortunately, when Phoenix compiles templates, it doesn’t give me access to the compilation environment. I need such access because I need to have control over macro expansion inside the templates. This means I’ll need to write a Vampyre.View module that compiles Vampyre templates correctly. This is a question of copying and pasting some files from phoenix and changing 2 or 3 functions. It would be cool if the Phoenix.Template.compile/2 function could be called with the __ENV__ of the view module, because it looks that would avoid all this work.

If I decide to write an UndeadView, or an integration with Drab, I’ll do it on top of the Vampyre package.

1 Like

It turns out I can work around that limitation without rewriting Phoenix.View. I just have to wrap the quoted expression in a macro. At macro exapnsion time, the wrapper macro can fetch the environment, expand its arguments and optimize the quoted expression.

1 Like

One of tbe problems with Vampyre templates is that while they are allow “normal” HTML widgets inside them (they just can’t optimize them), Phoenix’s own engine doesn’t accept Vampyre segments. It accepts fully compiled Vampyre templates of course, but not the incompletely compiled Vampyre templates that can be optimized. In truth, optimizing rendered Vampyre templates is probbaly not that hard, and I should look into it.

2 Likes

Benchmarks

I have uploaded an example project, so that I can run “more realistic” benchmarks. You can find the repo here. The benchmarks below are the result of rendering the form generated by the phoenix generators for a dummy User resource. This means the template is probably representative of “real world” templates.

You can run these benchmarks yourselfby running the following command in the repo above.

mix run benchee/main.exs

The results are:

Benchmarking phoenix...
Benchmarking vampyre...

Name              ips        average  deviation         median         99th %
vampyre      123.41 K        8.10 μs   ±231.37%           7 μs          25 μs
phoenix       10.13 K       98.69 μs    ±20.93%          95 μs      180.18 μs

Comparison:
vampyre      123.41 K
phoenix       10.13 K - 12.18x slower

This shows that in the case of forms, Vampyre templates are about 12x faster than the default phoenix templates. The absolute time differences are tiny, of course, because we’re talking about microseconds, but the improvement is quite impressive.

I still haven’t implemented all the HTML widgets implemented by phoenix_html. In fact, I have implemented just enough to be able to render the output of the Phoenix generators, but most of the rest can be implemented on top of what I already have. This is not one of those cases where “the code is simple and fast because implementing the last 10% takes the last 90% of the code/work”, because the work my widgets do it at compile-time. Although my incomplete implementation of the HTML widgets is already more complex than the one in phoenix_html (in terms of lines of code, and concepts I introduce throughout the code base) the complexity happens at compile time with the goal of making things fast at runtime.

Am I doing something dirty to get these benchmarks?

Yes. The implementation of form_for that makes it fast inlines a function call by introducing a hygienic variable in the global scope of the template. This doesn’t change the semantics of the template too much because the variable is inaccessible to the user-written code. But it is an issue if you nest form_for inside another form_for. This is not valid HTML but there might be reasons why you’d want to do that. In case the above isn’t scary enough, I reproduce the code here:

  def make_form_for(form_data, action, options, fun) do
    {:fn, _,
     [
       {:->, _,
        [
          [arg],
          body
        ]}
     ]} = fun

    validate_fun_arg(arg)
    hygienic_var = hygienize_var(arg)
    substituted_body = substitute(body, arg, hygienic_var)

    open_form = Tag.make_form_tag(action, options)

    hygienic_assignment =
      quote do
        unquote(hygienic_var) = FormData.to_form(unquote(form_data), unquote(options))
      end

    close_form = Segment.static("</form>")

    contents = [
      open_form,
      Segment.support(hygienic_assignment),
      substituted_body,
      close_form
    ]

    Segment.container(:lists.flatten(contents))
  end

I’m pattern matching on the AST of the function in the form to get the variable given as argument, an then, in the bpdy of the function, I substitute the old variable by a new one, and dump the body of the function into the global scope of the template. This avoids a function call and flattens the segments list, so that I can merge them together.

I think I can the issue with nested forms by doing even dirtier things with the process dictionary (I can safely assume that the template will be compiled by a single process) and using the variable’s :counter in the metadata to make variables even more distinct from each other.

What’s missing

I still haven’t implemented all HTML widgets. Also, I haven’t even implemented all widgets correctly. Because I’m aggressively optimizing everything, there is some heavy-duty case analysis on the widget-generating macros/functions. For example, tag(:input, name: "user[name]", id: "user_name") takes advantage of the fact that all attrbutes are static to render everything into a static binary (which can be merged with the previous or next binaries). However, if we change the above to tag(:input, name: @name, id: "user_name"), the expression will be rendered into a static part, followed by a dynamic part, followed by another static part. If the user writes instead tag(name, attrs), then this is a case I don’t even handle yet!

So I have yet to implement the “maximally dynamic” versions of my widgets. In such cases I might just fall back to Phoenix’s implementation, but there are still some performance optimizations to be had. For example, even if both the tag name and the attributes are purely dynamic, we know the rendered value starts with "<" and ends with ">", which can be merged into the previous binaries.

Is this worth it?

To render normal templates, I don’t know. As I said above, the relative differences in performance are impressive (I wasn’t expecting such a difference), but rendering templates is already so fast that maybe it doesn’t really matter. And all the complexity can introduce bugs, security vulnerabilities, etc.

To use with something like Drab or LiveView, or something like that, where being able to siolate the static from the dynamic parts of the templates is critical (to minimize sending data over the network), this is definitely worth it.

Even if for that reason alone, I plan on continuing to work on Vampyre until it has feature-parity with phoenix_html. Vampyre is already a drop-in replacement for phoenix_html (by changing two lines of code in your project you magically get Vampyre templates for free), and if the user decides to use Vampyre templates for something like LiveView or Drab, then it makes sense to use Vampyre templates to render the “normal” views.

It will take some time, of course, because I’m not only implementing template renderers. I’m actually implementing an optimizing compiler for template renderers, and the default widgets need to play well with the compiler.This means that everything is about 5x harder to write than the naïve versions in phoenix_html. It gets easier as I get further away, because I can implement some widgets on top of other widgets. For example, I’m implementing the form_for widget on top of the tag widget.

EDIT: I believe I’m doing everything right when defining the renderer functions for both Vampyre and Phoenix templates. I’ve copied the approach taken by Phoenix itself, so I guess I’m not being unfair to the Phoenix templates.

6 Likes

The demo project above (where the benchmarks are defined) uses Vampyre templates by default. That way, you can see that Vampyre is already an incomplete drop-in replacement for phoenix_html (although Vampyre depends on phoneix_html for some utilities for escaping HTML and other things). The only changes required to support Vampyre templates in a phoenix project are the following:

  1. Add :vampyre to your list of dependencies (:vampyre is not on hex yet, because it’s still missing a lot of things, so you have to add it from GitHub or as a path dependency).

  2. In the config/config.exs file, add the following line (which causes Phoenix to compile templates using the Vampyre compiler):

config :phoenix, :template_engines, eex: Vampyre.Template.EExEngine

Finally, inside your ProjectWeb module, replace:

  def view do
    quote do
      # ...
      # Use all HTML functionality (forms, tags, etc)
      use Phoenix.HTML
      # ...
    end
  end

by

  def view do
    quote do
      # ...
      # Use all HTML functionality (forms, tags, etc)
      use Vampyre.HTML
      # ...
    end
  end

The above makes the (API-compatible) macros from the vampyre package available in your views, so that you use the self-optimizing widgets instead of the slower widgets from phoenix_html.

2 Likes

This series of posts is very interesting from a “historical” perspective on the Vampyre project. Even I like to re-read them sometimes, but I think I should write something more definitive, like a blog post not that things are stabilizing somehow. Even for people who are completely uninterested in rendering HTML templates can learn some interesting things from it. The Vampyre project touches some interesting areas, such as:

  • How phoenix actually renders templates
  • Tricks to get performance on the BEAM (iolists are fast, but not that fast; also, function calls come with a cost)
  • Using macros to avoid doing at runtime what you can do at compile-time
  • Manipulating Elixir quoted expressions
  • Working with EEx engines (the documentation we have now is somewhat lacking; the hexdocs could use some love)
  • The importance of a smart intermediate representation in a compiler (don’t try to output the final code in a single step)
  • Using macros to delay expression evaluation
  • Inlining functions without explicit compiler support

I think I should write something in the line of the following (amazing!) blog post: https://www.bignerdranch.com/blog/elixir-and-io-lists-part-2-io-lists-in-phoenix/. Which by the way was the major source of inspiration for the whole Vampyre project. After all, the big idea for this came from that post: Phoenix automatically and universally applies this simple view caching strategy: the static parts of our template are always cached. The dynamic parts are never cached. The cache is invalidated if the template file changes. The end. The Vampyre project just takes that idea into the next level, by aggressively merging the static parts of the template.

4 Likes

just chiming in to say I’ve been following these posts rather closely and learning a lot, thanks! Right now I’m really tempted to avoid V-DOM diffing (because of the relative high expense of tree diffing) and just diff the dynamic data instead, but I haven’t thought of a good solution for how to render that data on the client. It seems to me that caching all of the static content loses context which would mean you’d have to send the context (presumably in the form of duplicate templates) to the client at some point (probably first page load, maybe on websocket connection instead?)

Are there any clever ways to encode the minimal amount of context into the compile time generated functions such that you could only diff on the dynamic data rather than a tree diff or would you just end up tree diffing again? I realize texas hasn’t been a target for you so feel free to ignore my questions that are very texas specific

1 Like

On the contrary. I believe I’ve been looking at server-side VDOM diffing since Drab.Live came out and (through talking to @OvermindDL1) I understood that controlling a dynamic UI through the server was viable.

I got pretty excited with the first texas announcement, and then the excitement died out when I understood that texas seemed to depend on parsing HTML strings with Floki (wtf?!).

I’ve never talked to you directly because I’ve been busy with other things, but I’ve been waiting for something faster to come out (parsing HTML at runtime is a big no for me)

If you use smart heuristics (like react does), you can diff a tree in linear time on the number of nodes.

This requires storing the previous tree in memoy and comparing it with the new one though, which costs memory. By diffing the dynamic data directly, you save a lot of memory.

If you restrict the data to a sensible datatype, auch as a map, you can diff it using already available algorithms, which are linear on the size of the map or list. Then, you can implement something like a mutation watcher, in which you build a dependency graph between the raw data and the VDOM and update only the parts that depend on data that has changed.

The Javascript framework VueJS does some clever things (which I don’t understand) with mutation watchers and dependency graphs, and as a result is both fast and intuitive.

@OvermindDL1 has a client-side framework written in OCaml that compiles to Javascript, which he claims to be very fast.

I think Drab.Live does something like that (and @grych is the correct person to explain it to you).

3 Likes

haha, that’s merely a means to an end - I wanted to make something work before I made it work well - I’ve been doing a lot of thinking about how to optimize and do more at compile time which is largely the reason I’ve been following your posts so closely. My initial impl idea was to generate two functions - one with the AST and one with IOLists…use the AST for tree diff’ing and sending patches to the client and use IOLsts for full html string rendering (first page load)

1 Like

I believe I read recently that react was O(n^2) do you have a source for it being linear?

1 Like

I’ve been doing a lot of thinking about this too and I think the memory will be negligible either way as 99% of realtime views will be spent viewing the same page. That means you can cache the most up-to-date version in an ETS table for all up-to-date clients to diff against for the next change.

2 Likes

(clicked “Reply” by mistake, continues here)

Elixir is not a good language to work with the DOM. The DOM looks like a nice tree but it’s actually a nasty graph full of backreferences.

I suggest you tag everything with an ID so you can communicate unbiguously between the server and the client.

It might be fruitful to drop the tree structure and insead work with a flat database of nodes identified by a UUID or sometjing like that.

Then you can try to build a dependency graph between “paths” (some people like to call it cursors) on the data and the DOM nodes.

Then, you can rerender only the nodes that have changed and send them to the client. If you want to be very obsessive, you can dif the changed nodes to send actuall minimal diffs. That will result in less surprisss for the user, because you preserve more state (css transitions, input focus, etc). But even with minimal diffs you might overwrite some state…

That’s where somutions like morphdom come handy. I believe @chrismccord got it right by commiting to diffing html on the client. The client knows more about the actual DOM the user is seeing than the server does.

When you accept you need to diff on the client, than I think it doesn’t make sense to buld a DOM (virtual or not) on the server, except for optimization purposes.

By building a DOM on the server, you can send smarter diffs to the client. This implies a tradeoff between performance on the browser and client: you spend CPU cycles and memory on the server to make the client slightly faster.

Testing whether this tradeoff makes sense is an wnormous undertaking, and I have decided not to get into that mess.

1 Like

React is not actually linear. It’s only linear for “common diffs”. I think you can make it quadratic with especially crafted diffs. But for the common cases (a single element inserted into the list), I think it’s linear. I n’t find the source right now

1 Like

this is actually very appealing to me! I’ll have to put more thought into this as it would solve a lot of problems. If I can get away from tree diffing I think the CPU concerns may still be valid (but much less so than tree diffing), but the memory concerns should be negligible as I was saying I believe any realtime app will spend the vast majority of its time with all clients viewing the same content.

1 Like

Don’t do this. You don’t want to represent HTML using elixir AST, no matter how similar it is. Use 3-tuples ({tag, attrs, contents}) to represent HTML, but decouple them from Elixir’s AST.

To optimize the HTML you’re generating you can try to use macros like I do. A macro that expands into a 3-tuple or a list of 3-tuples can be optimized into static data. You have to think like a compiler writer. Using Floki to turn EEx into 3-tuples (AKA html) is cute but it’s beside the point… You need to get things working with 3-tuples. THEN you can think about presenting a better interface to the user. It’s perfectly viable to use Floki to parse an EEx file at compile time into something sensible which you can work with later. I’m just saying you should ignore that part for now.

1 Like

I don’t know if your tree should be completely flat, but making it “flatter” would probably help. The way you usually implement graphs (and your approach might need a depednency graph!) is usually by leveraging pointers. In Elixir you don’t get pointers, so UUIDs might be the next best thing… It’s actually close to the mathematical definition of a graph, which is a pair (V, E) of sets, where the members of E (the edges) are pairs (u, v) where u and v belong to V. By having pointers to the tree nodes it’s easy to use graph algorithms.

As you can see in my first posts, I thought about using graphs for a while, but then I managed to avoid them. Working with graphs is not very pleasant in my opinion, but you can leverage things like libgraph

If I can get away from tree diffing

I don’t think you can completely get away from tree diffing. The hard part of diffing trees is diffing lists of children. That won’t go away by flattening the tree . You’ll have to deal with diffs of lists anyway.

But the change of perspective that comes from flattening the tree might be very helpful!

1 Like

Good advice, thanks

1 Like

Why do you need an ETS table here? Don’t you plan on having a process per client? You already require a process per client if you’re using a phoenix Channel. A Phoenix channel is pretty much a GenServer (maybe it’s even an actualy GenServer?). Why don’t you store the state of the client in the process itself? What do you gain fron an ETS table?

1 Like