Story behind
Recently, I gave a talk on a meetup about improving performance of Phoenix applications and the example app was a LiveView app. One of the problems I created was a N+1 where the solution was to use the new update_many/1 callback on LiveComponents as an example. The demo showed the “discovery” phase of performance issues by looking at OpenTelemetry traces (I was using OpentelemetryLiveVIew, but now OpentelemetryPhoenix also have some LiveView spans in the new unreleased version).
The problem I noticed was that the N+1 was only clear on full renders where OpentelemetryPhoenix would create a http server span around. Navigating through live patches, didn’t made it clear that there was a problem because there were a bunch of “rogue” Ecto spans in an isolated trace.
What this have to do with Phoenix LiveView itself? Well, these OTEL libraries base their OTEL spans on top of :telemetry
spans. Looking at the Phoenix LIveView code base I realised a few more :telemetry
spans would be helpful, so the proposal here is to list a few ideas and where to put these.
More callback spans
I first want to start with more callback events
[:phoenix, :live_view | :live_component, :update | :update_many]
- To help find where a query is coming from and be able to solve a N+1 problem, maybe it would be good to wrap calls to update / update_many.
- It should be easy since the only place that calls it are on
Phoenix.LiveView.Utils.maybe_call_update!/3
[:phoenix, :live_view | :live_component, :render]
- A normal N+1 problem is caused by rendering many live components that do a query in their update. Having the “wrapper” component render span be a “parent” of a “update” span would allow us to spot the N+1
- For LIveView, this would require wrapping both
Utils.to_rendered
andDiff.render
calls. We could wrap the wholerender_diff
function inPhoenix.LiveView.Channel
or at least the true branch on the if (when force == true or when it changed). This would allow it to encapsulate all “nested update/render spans”. - For LiveView components, it happens internally to Diff.render, in the
Diff.component_to_rendered
function, so it should be easy to wrap that in a span as well
Maybe it is also worth adding handle_call
, handle_cast
and handle_info
for the cases where we forward that to the live view (socket.view.handle_call(...)
), but also including the handle_result
subsequent call.
Channel Lifecycle Spans
Now is where I don’t have a clear idea whether it is a good idea or not, but maybe worth exploring. To be able to wrap a “full cycle” on the channel process for each type of “message”. The challenge here is that this could be leaking internals that should not be treated as the public. The benefit is that we would be able to treat it on OpenTelemetry as “wrapping messaging spans”.
In general the idea would be to wrap the outermost callbacks on Phoenix.LiveView.Channel
to be able to have a higher level view on what’s happening.
Some examples:
[:phoenix, :live_view_channel, :mount]
- This would be the “join” call where the live view is mounted. Not sure where to put it, if it should encompass the whole handle_info call with the join data or if it should be internally on the mount or verified_mount functions. I guess it depends on how much data we want to include in the
:start
event? Also there are questions on what we should send when we reply with an error. This is because the return value can be{:stop, :shutdown, :no_state}
so there is no reason on what caused it (we only have a slight hint on the GenServer.reply call) to stop. So maybe we should add some data on the return? but that would require the whole tree to be changed: there are many places that do that, so it might be a challenge to organize this in any sane way.
- This would be the “join” call where the live view is mounted. Not sure where to put it, if it should encompass the whole handle_info call with the join data or if it should be internally on the mount or verified_mount functions. I guess it depends on how much data we want to include in the
[:phoenix, :live_view_channel, :handle_event]
- Wrapping the whole
handle_info
case for the%Message{topic: topic, event: "event"}
message
- Wrapping the whole
[:phoenix, :live_view_channel, :live_patch]
- Wrapping live patch calls. However that can be challenging since it can happen on the
%Message{topic: topic, event: "live_patch"}
but also internally on the functionsmount_handle_params_result
andhandle_redirect
.
- Wrapping live patch calls. However that can be challenging since it can happen on the
Also, it might be worth having a span for each message it receives, from redirect messages, to async results.
Contribution
If this is desirable, I can open a PR on phoenix_live_view repo with some of these. I think the extra telemetry events for the live view / live component callbacks are almost certainly worth it while others probably need more discussion.