Peep is a new TelemetryMetrics reporter that supports both StatsD (and Dogstatsd) and Prometheus.
While load testing a new Websocket-based API gateway written in Elixir, I encountered performance issues with TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd. This prompted me to write Peep, which makes different choices about storing and sending TelemetryMetrics data.
- Instead of sampling or on-demand aggregation, Peep uses histograms (backed by
:ets.update_counter/*) to store distributions, copying the approach taken by DDSketch.
- Instead of sending StatsD packets for each telemetry event, StatsD data is periodically sent in a small(er) number of large(r) packets.
This library is currently running in production, in a service handling >1 million requests per minute. With a moderate number of metrics defined, the service emits StatsD data at a rate of 4KiB/s, with no observed packet drops (we use Unix Domain Sockets to send Dogstatsd lines to Datadog agents, so it’s possible for :gen_udp to return :eagain when attempting to send packets).
Here’s an image showing a drop in CPU use after replacing TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd with Peep:
Here’s another dashboard for the same period of time, showing a slight (but not unwelcome!) drop in memory usage:
Feedback and contributions welcome!