Sorry if this is a daft question, but I can’t seem to find any documentation that answers my question …
We’re in the process of migrating a Phoenix application from a single node to an OTP cluster. Almost everything is working nicely, apart from some custom metrics we’re using to track usage of various parts of the application.
We have a simple prom_ex plugin added to our prom_ex plug, which handles telemetry events that we send when certain actions happen,. Here’s a slightly simplified version:
defmodule MyApp.PromEx.StatsMetrics do
use PromEx.Plugin
@impl true
def event_metrics(_opts) do
Event.build(
:my_app_stats_event_metrics,
[
sum(
[:my_app, :stats, :page_visits],
tags: [:tool, :role],
description: "The count of a tool being visited by someone in a role"
)
]
)
end
end
… with events are emitted at appropriate points:
:telemetry.execute([:my_app, :stats], %{page_visits: 1}, %{
tool: :some_tool,
role: user.role.name
})
The problem is that we now have one instance of the plugin running on each node in the cluster, so stats are recorded separately for each node, and when Prometheus scrapes the numbers it sees the counts varying wildly as the load balancer routes it to a random node each time.
What we’d like to end up with is either a single instance of prom_ex (or just this plugin?) on the cluster (eg using highlander), or to somehow guarantee that the events are broadcast (using Phoenix pubsub, maybe?) so that all instances of prom_ex show the same values (but then what happens when a node is temporarily taken out of the cluster for an application upgrade?)
It feels like there’s probably a simple way of achieving this and I’m missing something obvious – any ideas?
Thanks!