Peep - Efficient TelemetryMetrics reporter supporting Prometheus and StatsD

Peep is a new TelemetryMetrics reporter that supports both StatsD (and Dogstatsd) and Prometheus.

While load testing a new Websocket-based API gateway written in Elixir, I encountered performance issues with TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd. This prompted me to write Peep, which makes different choices about storing and sending TelemetryMetrics data.

  1. Instead of sampling or on-demand aggregation, Peep uses histograms (backed by :ets.update_counter/*) to store distributions, copying the approach taken by DDSketch.
  2. Instead of sending StatsD packets for each telemetry event, StatsD data is periodically sent in a small(er) number of large(r) packets.

This library is currently running in production, in a service handling >1 million requests per minute. With a moderate number of metrics defined, the service emits StatsD data at a rate of 4KiB/s, with no observed packet drops (we use Unix Domain Sockets to send Dogstatsd lines to Datadog agents, so it’s possible for :gen_udp to return :eagain when attempting to send packets).

Here’s an image showing a drop in CPU use after replacing TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd with Peep:

Here’s another dashboard for the same period of time, showing a slight (but not unwelcome!) drop in memory usage:

Feedback and contributions welcome!

11 Likes