PhoenixUndeadView - let's discuss optimization possibilities for something like Phoenix LiveView

tmbb · September 12, 2018, 4:23pm

I’ve decided to create this topic to discuss optimization possibilities for something like Phoenix LiveView. I’ve created this topic under the “Libraries” tag, because I might actually write a library based on this. Because my totally vaporware project needs a name, let’s call it PhoenixUndeadView. Why UndeadView? Because I want my views to be as “dead” as possible, so that they are easy to optimize and minimize data transmission over the network.

There isn’t any code to look at yet, because all I have is small experimental code fragments.

Key ideas

My efforts to (prematurely) optimize something like LiveView center around a couple ideas. The initial render should work just like a normal template, ideally in a way that’s indistinguishable from normal templates. It should use the same template language as the rest of the framework, and as much as possible the same HTML generation helpers (those in Phoenix.HTML, for example).

Basic architecture

According to Chris’s talk, LiveView works by rendering state changes into HTML and sending the entire HTML to the client, and diff the HTML on the client using morphdom, a Javscript library that produces optimal diffs between two DOM trees (which preserves cursor positions, CSS transitions, etc. Some custom work is probably needed to support input fields where the user is typing, but the bulk of the work is done by morphdom. In any case, this requires writing a fair amount of Javascript, which is always a source of integration problems with the Elixir codebase, but that’s part d the reality of web development.

I really like this architecture, and I’ve had this idea independently as soon as Drab appeared, but I’ve never worked on it because sending that much HTML back to the client has always seemed too wasteful.

But recently, I’ve thought of some ideas to optimize the amount of data that needs to be transferred.

Optimization ideas

Phoenix exploits iolists as a good way to cache the parts of the template that remain the same. Only dynamic parts must be regenerated. See here and here if possible. The main idea behind Phoenix’s use of iolists is that what is dynamic must change and what is static must remain the same. How does this apply to something like LiveView? When the state changes, only the parts of the HTML that change should be sent over the network.

Currently, I’m not worried about generating only the minimal changes, but I think that sending minimal changes is probably very beneficial. However, generating the entire string and trying to deduce the parts that have changed is probably not efficient. That’s why I think that separating the static from the dynamic parts at compile-time would be beneficial.

The idea would be to do the following:

cross-compile the EEx template into a javscript function which takes the dynamic substrigs as arguments (not the arguments to the template function!). For example, the template:

Hello <%= @username %>! How are you? It's been <%= @idle_days %> since we've seen you!

could compile into:

function renderFromDynamic(dynamic) {
  var fullString = "Hello "
    .concat(dynamic[0])
    .concat("! How are you? It's been ")
    .concat(dynamic[1])
    .concat(" since we've seen you!");

  return fullString;
}

At the same time, compile the EEx template into something that only generates the (already escaped) dynamic strings. I thought this was very easy, but now I’ve come to think it’s very hard in the sense that you almost have to write your version of an Elixir compiler for this to work as it should. EEx templates map into quoted expressions with varable rebindings, and other things which make it harder to extract parts of the quoted expression into a list. In the case above, the template could be compiled into something like:

def render_dynamic_parts(conn, assigns, other_args) do
   [~e(<%= @user %>), ~e(<%= @idle_time %>)]
end

Then, each time the state changes, send only the dynamic parts to the client (as a JSON array of strings; JSON arrays are reasonably compact), and have the Javascript function above render the full string on the client and diff the result with morphdom or something else.

This way we retain the principles of phoenix templates: what is constant never changes and what is dynamic always changes

Problems

EEx templates are extremely powerful. In fact, the entire Elixir language is embedded in such templates. That means writing your own parser or tokenizer to EEx templates is very hard. Like above, it’s the same as writing a full Elixir tokenizer (with some small changes).

The usual way to write EEx engines is to hook into the Engine behaviour and customize the necessary callbacks. Again, this seems hard but it can be done.

Our goal here is to identify the constant and dynamic parts while we compile the template. If you have a dynamic segment, then everything “inside” that segment is also dynamic (no matter how many constant binaries you can extract from that segment), because we only want to split the template into static and dynamic parts one level deep.

This has a small problem. HTML helpers such as form_for, which is used like this

<%= form_for ..., fn f -> %>
  <%# other stuff %>
<% end %>

produces a widget which is 100% dynamic, with no opportunities for optimization. Even if the inside of the form is mostly constant (which in practice, it is). This means that any clever optimizations one might come up with will fail in one of the primary use cases for LivewViews, which is realtime form validation…

This means we can’t use the “normal” HTML helpers in Phoenix.HTML. There is a simple solution, though. The idea came from looking at nimble_parsec. NimbleParsec uses simple functions (no macros!) to generate a parser AST, which are compiled into Elixir expressions in a separate step.

Transposing this to the templates, it makes sense to write a series of compile-time helpers, which are meant to do as much work at compile time as possible. We can’t maintain total compatibility, but we might be able to salvage part of the API and maintain a reasonable level of compatibility.

I would have to write a series of helpers like PhoenixUndeadView.HTML or something like that. The goal is to resolve as much as the static template at compile-time as possible. I’ve been writing HTML helpers a lot, and the truth is that most of them are mostly static (or can be rewritten in a way that makes them mostly static).

Compiling EEx templates into a reasonable format

As I said above, compiling EEx templates into a format that makes it easy to separate the static and dynamic parts is not trivial. I’m thinking of developping a more convenient intermediate representation (probably falttened iolists, which are actually very easy to use directly, although they’re a little weird synctatically) and try to compile EEx templates (or similar ones) into that format.

tmbb · September 17, 2018, 7:00am

I’m interested in going the way of radical AST rewriting. One of the things I’ll have to do for my idea to work is to turn all variable reassignments (pattern = value) into single assignments of variables with unique names.

While I could transform x = f() into x_unique_suffix__1 = f(), that wouldn’t be nice for error reporting. Fortunately, the elixir AST marks each variable with a unique context. In the AST variables are represented as {atom, meta, context}, where context is an atom (often a module name).

My question is then the following: to turn those variables into unique variables, can I just change the context? For example, if the context is MyModule, could I change it to MyModule__unique_suffix_1? This seems like the obvious way to go, but I’m afraid it might mreak something I’m not seeing.

From what I understand, contexts are just meaningless tags to disambiguate variables with the same name that shouldn’t shadow each others. The fact that the context is often a module name is just an implementation detail, right?

Pinging @josevalim because of our discussion in the LiveView thread.

josevalim · September 17, 2018, 8:55am

Sorry but the best advice I can give right now is: don’t do it. Emit the correct code in the first place and avoid the AST transformation.

tmbb · September 17, 2018, 12:02pm

Ok, but theoretically, if there was no other option, I can change the context to whatever I want, right? I’m not very optimistic on the possibility of generating the kind of code I need without AST rewriting.

josevalim · September 17, 2018, 12:30pm

You could. Although it is possibly easier to do that by setting the counter in the metadata, which is how we do hygiene in macros.

tmbb · September 18, 2018, 10:43pm

Some personal notes which might be useful for AST rewriting (based on a discussion with @OvermindDL1). @josevalim might wish to shield his eyes from the horror; This is like @OvermindDL1’s typed elixir experiments but worse…

An elixir block:

(
  expr1
  expr2
  expr3
)

which compiles to:

{:__block__, [],
 [
   expr1,
   expr2,
   expr3
 ]}

Such a block has some complex scoping rules, which get even more complex if there are nested blocks, as this image below shows:

scopes

The diagram above shows scopes as being enveloped with lines. Visually, the scoping rules are easy to explain: a variable defined in expr1, for example, will be in scope in every expression below, possibly inside nested blocks. More interesting the fact that variables defined in the nested block are in scope in the parent block, but only in the expressions below (as the lines indicate).

The scoping rules are similar to the ones in Erlang. In fact, a block in elixir seems to compile into a sequence of erlang expressions separated by ,. That is, the above block would compile to:

Expr1 , Expr2 , Expr3.

where the separator , is an operator that carries the context (i.e. scope) of the previous expression into the next expression. Of course, the comma (,) is also a sequencing operator in elixir. The block above is equivalent to expr1 , expr2 , expr3. We usually use newlines to separate expressions instead of the ,, but it’s the same idea.

Well, the fact that blocks are compiled into a sequence of binary operations using the , operator (I think it’s not formally an operator in Erlang, but it helps if we think of it as one) gives us an idea: we can compile the blocks ({:__block__, meta, expressions}) into an intermediate binary form, say:

{:__comma__, [],
  [
    left_side,
    right_side
  ]}

Where comma is the obvious thing (analogous to Erlang ,). By the way, the Erlang comma is also very similar to OCaml’s let ... in ... statement. Which might mean we can apply lambda calculus reduction rules to our modified AST, but I digress…

The advantage of having an AST node with only two children is that the scoping rules become much simpler. Instead of the complex scoping rules of the elixir blocks (in the picture above), the scoping rules here are much simpler:

The scope that results from the expression on the left side is the scope used to compile the right side
The scope that results from compiling the entire __comma__ block is propagated upwards so that it can be used in the parent __comma__ node.

We can represent scopes as maps from variable to unique identifiers, something like: %{var_name, var_meta, var_context} => uuid}. The uuid can be used as the counter to disambiguate between variables with the same name.

In a tree of commas, it looks like the scope always accumulates as you walk the tree in post-order, which means that you can cache it as something like:

{:__comma__, [scope_right: scope_right, scope_up: scope_up],
  [
    left_side,
    right_side
  ]}

where scope_right is the new variables in scope after compiling the left_side (which will be part of the scope for the right_side) and scope_up, a map with the new variables that propagate up (into the right side of the parent __comma__).

This is a very simple model. We can consider these scopes as deltas, which we apply to the previous scope (using Map.merge/2). Variables in blocks (and as an extension, __commas__) don’t ever go out of scope, which means the scope always increases or stays the same as we traverse the __comma__ tree in post-order. Which means we just need to keep a running scope and can simply merge the maps in order.

This approach is simple and probably doesn’t require any complex graph algorithms (that is, we only operate on trees and never on more general graphs).

I think that to move all variable assignments to the beginning of the tree I just have to traverse the tree in “inverse-post-order” (that’s roughly right to left; I don’t know if this term even exists), and if I could do it it’d be great, because it would avoid the need to use general graphs and some kind of dependency resolution algorithm. I think that if I play my cards right I won’t have to use graph algorithms…

Then, after moving the assignments out of the main compile-time nested iolist of the EEx template (let’s not lose sight of the fact that we want to optimize a template!), I can just expand in place the top-level variables in the list, and flatten the list at compile-time.

With the flattened list, I just have to merge the static binaries and it’s basically done

This is the kind of thing that a compiler for a functional language (i.e. OCaml, Haskell) does or can do all the time, especially if the reduction semantics of the language are simple. Elixir is not a purely functional language by most criteria, but it does have relatively simple reduction semantics. I’ll not try to explain what reduction semantics are, but it’s basically a set of rules that tells us how we can replace some symbols in the code by other symbols. For example:

a = f(1)
b = a

can’t be replaced by:

a = f(1)
b = f(1)

(because f(1) might return a different result when it’s invoked twice)

So, if theoretically (wink, wink ) I were to use AST rewriting I’d try do more or less the following:

Build a dependency graph of the variables in scope
Rename the variables so that variable names become unique
Bind the variables according to the order in the dependency graph, so that we can factor them out of the iolist (which represents the final output of the template)
Substitute some of the variables in place so that the list contains only references to variables defined outside of the list
Build and flatten the final list so that we can separate the static and dynamic parts.

Note: why do we need to move all the variable assignments outside of the list?

Because in Elixir, variables bound inside an element of a list are not in scope in the remaining elements of the list. For example:

iex(1)> [x = 1, x + 1]
** (CompileError) iex:1: undefined function x/0
    (stdlib) lists.erl:1354: :lists.mapfoldl/3
iex(1)> x = 1, [x, x + 1]
[1, 2]
iex(2)>

This is exactly what we’ve done here: we’ve moved the variable assignment out of the list so that we can use the variable inside the list. So, under these limitation, how is it possible that the EEx templates in Phoenix (which compile to iolists) can ssign variables and refer to them later? The answer is that Phoenix templates compile to terms with lots of nested blocks. For example, this:

<% a = 1 %>
<%= a %>

compiles to this (more or less, the template below is simplified):

{:safe,
 [
   (
     tmp1 = [
       (
         tmp2 = ""
         a = 1
         tmp2
       )
       | "\n"
     ]

     [tmp1 | {:dynamic, a}]
   )
   | "\n"
 ]}

As you can see, we are building an iolist, but to be able to refer to variables that are defined in other elements of the list, we must use nested blocks so that the scopes propagate. Notice that we create a more or less “useless” block just to define the variable a so that we can use it later.

OvermindDL1 · September 18, 2018, 11:06pm

I’m pretty sure ; is a proper operator in Erlang (it is in OCaml), just that it evaluates the left expression, adds it to the environment, and then evaluates and returns the right expression. . I’m unsure is an operator or just a terminator for an erlang function though…

tmbb · September 18, 2018, 11:08pm

Maybe it is, but it doesn’t appear in this table: Erlang -- Expressions

EDIT: keep in mind that the little I know about Erlang is from reading the official docs and reading the output of the BEAM disassembler run on Elixir module files…

NobbZ · September 19, 2018, 5:14am

Nope.

The semicolon is to concatenate alternations, not sequences, you use it as here:

my_not(true) -> false;
my_not(false) -> true.

You were not able to see any binding from the first clause in the second clause, neither would you in elixirs equivalent:

def my_not(true), do: false
def my_not(false), do: true

What you probably mean here is the comma ,, which is used to build “sequences”:

foo(A) ->
  B = A + 1,
  C = A - 1,
  div(B + C, 2).

which is equivalent to this elixir code:

def foo(a) do
  b = a + 1
  c = a - 1
  div(b + c, 2)
end

tmbb · September 19, 2018, 6:00am

Yes, it’s the comma. Thanks!

tmbb · September 19, 2018, 7:41am

I can’t edit my post anymore, so could a nice mod please add something to the effect that semicolons should be replaced by commas?

OvermindDL1 · September 19, 2018, 4:45pm

Yes I did! I’ve been doing too much OCaml and got them mixed up, it is ,, like in C++ (and in C++ you can even override operator,, lol).

Edited, it look good? Sorry for the mis-remembering, the comma is like the comma operator in C++, I need to remember it that way. ^.^

tmbb · September 25, 2018, 4:00pm

I’ve managed to write a pretty functional proof of concept that doesn’t require AST rewriting of complex quoted expressions. Code is here:

I can generate “full” templates as well as templated containing only the static and dynamic parts of the template, for easier diffing over the network. As a positive side-effect, I my template engines generate code which is much cleaner than the code generated by the engine in Phoenix.HTML, and it might even be slightly faster.

I should benchmark it one of these days.

EDIT: code here: https://github.com/tmbb/phoenix_undead_view/blob/master/README.md#phoenixundeadview

josevalim · September 26, 2018, 7:21pm

Link to the relevant engine? I think you have a couple ones around, so I am not sure which one I should look at.

tmbb · September 26, 2018, 9:03pm

There are 4 engines. Only the following 3 are important (the other one is a relic which I keep around for reference purposes):

UndeadEngineFull - generates the full webpage/component, for the initial server-side rendering. This is the equivalent to what Phoenix.HTML.Engine does already (and as you can see, the code was mostly copied from there). This template generates the following code: quoted expressions; code. The result is rendered to a list of alternating static and dynamic elements (i.e. static, dynamic, static, dynamic, …, dynamic, static). The list always starts and ends with a binary (even if it i an empty one ""), to make it easier to render the full text from diffs, but this can be optimized away if you don’t need diffing (and normal Phoenix templates don’t need it).
UndeadEngineDynamicParts - This engine renders only the dynamic parts rendered by the above template. The goal would be to render only the parts that change, send them over the network, and join them with the static parts on the client.
UndeadEngineStaticParts - This engine renders only the static parts. The goal is to send these parts to the client only once so that the client knows how to merge the dynamic parts into the DOM. I think of this as a diff of the template (even though we are always diffing agains the static part, and not against the previous version)

A normal page render requires sending the output of the Full engine (as the literal DOM) and the StaticParts engine (compiled into a Javascript function that knows how to concat the dynamic parts it will receive later). Whenever the server wants to change the DOM on the client, it must run the DynamicParts engine (and only this one!) and send the list into the client, which will merge the dynamic parts into the textual HTML and merge it into the DOM.

If all you want is to compare my engine to Phoenix.HTML.Engine, you only need to look at the UndeadEngineFull and at its output. The output of the engine is a quoted expression, but you should format it into source code using Macro.to_string() and Code.format_string!() because the quoted expressions are quite unreadable, even for simple templates.

Now, my 3 engines contain some common functionality. In fact, only the handle_body() callback is different for each engine. To simplify things, I always use a UndeadEngine, which contains the implementations of the common callbacks. Because I didn’t want to define a lot of code in the __using__ macro, I’ve put the actual implementation of the callbacks in the UndeadEngineImpl module. While these layers of indirections make my project more complex, they’re kinda required if I want to work on 3 similar engines at the same time, but you don’t need any of these if you only want the Full engine.

I haven’t tested this engine on complex templates, so it’s possible you might find a way of breaking it. I’m not sure I’m scoping the inner variables correctly, but that’s very easy to fix by adding the :counter attribute to the variables, changing the variable names according to the nesting level (which is easy because I can keep track of the nesting level if I want) or by wrapping the inner structures in function calls so that variables don’t leak.

You might need to read the Utils and Merger modules, which build and optimize the final AST.

Maybe the code is too unfinished for you to look at, but I think it’s a good proof of concept even if it leaks some variables in highly nested templates.

EDIT: Problems with variable scoping should be solved now!

PS: if you want we can discuss this over private messages, but I think the discussion might be interesting for the community in general.

tmbb · September 26, 2018, 11:03pm

The main thing that’s missing now from the Undead Engines is a way to expand macros and extract the constant and dynamic parts. That sounds really hard, but it’s not as complex as it might seem…

If you look at the code generated by the engines, it follows a very simple pattern (in fact, the pattern is even a little more restricted than that):

  {:safe,
    (
      var1 = ...
      var2 = ...
      var3 = ...

      ...

      varn = ...

      [var1, var2, ..., varn]
    )}

If I can recognize this pattern at compile time, I can “lift” it into a lower level and add the new variables to the lower level. Because the format is generated by my own code, it’s trivial to convert this quoted expression into a more manageable format. I’d be rewriting the AST, but a very simple and well-defined AST fragment. This means that the Engines can’t expand every macro, but they can expand every macro that generates a quoted expression as if it were generated by the Engine, which is a useful subset.

There are possibilities that avoid AST rewriting, like choosing to expand macros inside the Engine, instead of expanding them after the template has been compiled. The problem with that approach is that it only works for “wel behaved” macros, which have a return value that only depends on their arguments and the env outside of the quoted expression. If they depend on variables defined inside the quoted expression, they won’t work. Since we want the semantics of these Engines to be exactly equal to the semantics of the Phoenix.HTML.Engine, I should probably support the general case.

Anyway, supporting macro expansions is not a big deal, because the macros will be expanded anyway when the quoted expression is compiled. This is just an optimization. I’m a little short on time to promise I’ll get all of this to work soon…

josevalim · September 27, 2018, 7:34am

Great job!

I think we should only have the UndeadEngineFull and we can assume that it will always return something in the shape of {:safe, {:__block__, [], [..., ..., [static, dyamic, ...]}}. This way we can extract static and dynamic out whenever we feel like it.

About macro expansion, I need to put a bit more thought into that.

tmbb · September 27, 2018, 7:52am

Yes, that’s totally possible. Having the three engines was mostly an optimization. The UndeadEngineDynamicParts engine was mostly an optimization. Depending on how the BEAM represents static binaries in list having this version might save some memory and time copying the binaries into the list.

Because most of the time templates are mostly large static binaries with small dynamic parts, if you’re copying those binaries into a list you’re wasting a lot of memory.

But you’re more familiar with the BEAM internals than I am.

tmbb · September 27, 2018, 8:37am

Of course. This pet project of mine has always been more about thinking than writing code. You need to understand pretty well what’s happening and where you eant to go before starting to type the code, or you might code your way into a dead end… When you think your way through, the actual code is often obvious.

Anyway, did you understand the basic idea in my explanation? The defails are a little bit involved, but the basics are simple:

“Compile” the text into an abstract list of variable assignments in a format convenient for further transformation, not as literal AST nodes. That’s what my code does currently, as you have probably noticed.
Convert the abstract assignments into a quoted expression with (possibly nested) concrete assignments (i.e. literal AST nodes).
Expand the macros
Convert the quoted expression into the standard abstract format. This is not that hard because the AST was generated by us.
Recursively lift expressions matching the {:safe, block} format from lower levels into the upper levels
Optimize the resultant abstract lists
Compile them into quoted expressions again

This is mostly a repetition of what I said earlier, but, splt into easy understandable steps. The inly tricky step is 4. This is easy to understand conceptually and easy to describe in pseudocode, but not that trivial to implement. Not that it requires anything more clever than tree manipulation, though.

Just to be clear, this is NOT what I’m doing. This is what I’d like to do.

tmbb · September 27, 2018, 8:45am

Bonus (if you can still follow my ramblings):

One “problem” of expanding macros inside the template is that the interesting expressions are “trapped” inside case expressions:

case(expression_to_macroexpand) do
  {:safe, ...} -> ...
  ...
end

These expressions are easy to find, but I’m planning on compiling into an intermediate form such as:

{:__dynamic_expression__, expression_to_macroexpand}

This is easier to analyze statically than a case statement. Then, when generating the final quoted expression, I can compile these tuples into proper case statements.