Peep is a new TelemetryMetrics reporter that supports both StatsD (and Dogstatsd) and Prometheus.
While load testing a new Websocket-based API gateway written in Elixir, I encountered performance issues with TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd. This prompted me to write Peep, which makes different choices about storing and sending TelemetryMetrics data.
Instead of sampling or on-demand aggregation, Peep uses histograms (backed by :ets.update_counter/*) to store distributions, copying the approach taken by DDSketch.
Instead of sending StatsD packets for each telemetry event, StatsD data is periodically sent in a small(er) number of large(r) packets.
This library is currently running in production, in a service handling >1 million requests per minute. With a moderate number of metrics defined, the service emits StatsD data at a rate of 4KiB/s, with no observed packet drops (we use Unix Domain Sockets to send Dogstatsd lines to Datadog agents, so itās possible for :gen_udp to return :eagain when attempting to send packets).
Hereās an image showing a drop in CPU use after replacing TelemetryMetricsPrometheus.Core and TelemetryMetricsStatsd with Peep:
This version fixes an issue with exposing data for Prometheus. If you use Peep with Prometheus, you should upgrade to this version.
Changes
fixes an issue with Prometheus exposition where zero-valued bucket time series are not shown
Add support for custom bucket boundaries. As part of this change, the distribution_bucket_variability option was removed.
Custom bucket boundaries
With Peep 2.0.0, the default log-linear bucketing strategy becomes an implementation of the new Peep.Buckets behavior.
You can use the Peep.Buckets.Custom module to define your own bucket boundaries. This compiles to efficient pattern matching with function heads, which ought to scale better than traversing a list.
Thank you for that project. It allowed me to give Supavisor ~30x boost in latency (measured by pgbench) over using telemetry_metrics_prometheus_core. I also have prepared PR for prom_ex to be able to use peep as a metrics store.
Iām curious about how much impact it would have in my application, but I think canāt afford to test it right now, as it would be a pretty big change (we have a lot of Telemetry.Metrics.summary/2), which Peep doesnāt support.
Did you measure the impact of Telemetry.Metrics beforehand? Something like fprof? If you could please share, it could help me a lot
This version introduces a slight change in how Peep is configured (replacing keyword lists for maps in the global_tags option) that is not backwards compatible. Upgrading from v2.x.y will require making some small changes.
Thanks to another contribution by @hauleth, it is now possible to override the type of a āsumā or ālast valueā metric in the Prometheus exposition.
For example, if you want to track socket statistics, which are often pre-summed, you could store the data in peep with last_value/2, but report it as a counter-type metric in the Prometheus output.
I havenāt made many announcements here in a while, but Iāve published a few new Peep versions. Thank you to @aloukissas and @mjm for your contributions!
While encountering an issue with Peep receiving unexpected messages when sending StatsD data via Unix Domain Sockets, @mjm changed Peep processes to ignore unxpected messages, and ignore the shutdown reason when terminating.
Peep v3.3.0
Upon request by my employer, I introduced a new storage engine for Peep metrics that trades reduced lock contention for increased memory usage; :striped. Rather than storing all metrics in a single ETS table, :striped uses one ETS table for each scheduler thread.
I donāt exactly recommend that users switch to this storage method unless they are noticing lock contention, which may happen when handling thousands and thousands of metrics of telemetry executions.
Hereās some :lcnt output from a bidder service for RTB ads:
Sorry if this is the wrong place to put this but I feel like Iām doing something dumb when setting this up.
Whenever I have the plug in my endpoint.ex file for a phoenix project before my router, only the metrics route matches and all the other routes show 404. If I put it after my router, all my routes show but I get a 404 for /metrics. I feel like I have the worker set up fine and everything else but setting up the plug I feel like Iām missing something.
Hey! Not at all the wrong place to post. You found a bug in Peep
When adding Peep.Plug to a Phoenix project, I find myself using the following:
forward("/metrics", to: Peep.Plug, worker: my_peep_worker)
Note that, for the time being, you may need to specify the path twice if you want to use a path other than ā/metricsā:
forward("/my-metrics", to: Peep.Plug, path: "/my-metrics", worker: my_peep_worker)
That should address your immediate issue.
Iāll improve the documentation in Peep.Plug to reflect this, and I might change some of the code in there, such as not responding with 404 when the URL path does not match the metrics endpoint. While that code is easier to test, and makes sense when serving the metrics endpoint on a different port, the default behavior is confusing.