Telemetry/PromEx metrics in a cluster

kerryb · April 29, 2025, 3:01pm

Sorry if this is a daft question, but I can’t seem to find any documentation that answers my question …

We’re in the process of migrating a Phoenix application from a single node to an OTP cluster. Almost everything is working nicely, apart from some custom metrics we’re using to track usage of various parts of the application.

We have a simple prom_ex plugin added to our prom_ex plug, which handles telemetry events that we send when certain actions happen,. Here’s a slightly simplified version:

defmodule MyApp.PromEx.StatsMetrics do
  use PromEx.Plugin

  @impl true
  def event_metrics(_opts) do
    Event.build(
      :my_app_stats_event_metrics,
      [
        sum(
          [:my_app, :stats, :page_visits],
          tags: [:tool, :role],
          description: "The count of a tool being visited by someone in a role"
        )
      ]
    )
  end
end

… with events are emitted at appropriate points:

:telemetry.execute([:my_app, :stats], %{page_visits: 1}, %{
  tool: :some_tool,
  role: user.role.name
})

The problem is that we now have one instance of the plugin running on each node in the cluster, so stats are recorded separately for each node, and when Prometheus scrapes the numbers it sees the counts varying wildly as the load balancer routes it to a random node each time.

What we’d like to end up with is either a single instance of prom_ex (or just this plugin?) on the cluster (eg using highlander), or to somehow guarantee that the events are broadcast (using Phoenix pubsub, maybe?) so that all instances of prom_ex show the same values (but then what happens when a node is temporarily taken out of the cluster for an application upgrade?)

It feels like there’s probably a simple way of achieving this and I’m missing something obvious – any ideas?

Thanks!

hauleth · April 29, 2025, 3:45pm

That is bad idea. What if the node storing all metrics will go down? You will lose everything you have.

In Supavisor we had encountered similar problem and our solution was pretty different:

each node collects their own metrics
at the time of export we traverse metrics from all nodes and merge them
then we do export

Code:

github.com/supabase/supavisor

lib/supavisor/monitoring/prom_ex.ex

c38a7bd98


      
            Peep.get_all_metrics(__metrics_collector_name__())
          end
          
          def fetch_tenant_metrics(tenant) do
            case Cachex.get(Supavisor.Cache, {:metrics, tenant}) do
              {_, metrics} when is_map(metrics) -> metrics
              _ -> %{}
            end
          end
          
          def fetch_cluster_metrics do
            [node() | Node.list()]
            |> Task.async_stream(&fetch_node_metrics/1, timeout: :infinity)
            |> Stream.map(fn {_, map} -> map end)
            |> Enum.reduce(&merge_metrics/2)
          end
          
          def fetch_cluster_tenant_metrics(tenant) do
            [node() | Node.list()]
            |> Task.async_stream(&fetch_node_tenant_metrics(&1, tenant), timeout: :infinity)
            |> Stream.map(fn {_, map} -> map end)

This requires Peep as a storage backend (which is much more performant from my experience).

kerryb · April 30, 2025, 9:01am

Thanks – I’ll give this approach some thought.

kerryb · April 30, 2025, 2:02pm

As our use case was fairly simple, I ended up removing PromEx altogether, storing the counts in the database, and generating the metrics page with a simple controller action and Ecto query.

adamcstephens · May 15, 2025, 2:21pm

Can you explain why you did this rather than scraping each node’s individual metrics? What are you getting from merging them in-cluster rather than at the prometheus/dashboard level?

hauleth · May 15, 2025, 2:37pm

We exposé metrics for individual clients as well, so we would need that gathering metrics from all nodes anyway. That way it is also easier to gather metrics as we support self-hosting, which makes the operations much easier. And with current implementation it is quite robust solution.