Telemetry influxdb reporter

netDalek · October 16, 2019, 6:31am

I’m thinking about rewriting metrics subsystem in my project. I wish to do it in some unified manner so it can be easily copied to other projects. I’m looking at telemetry and inspecting its reporters. We use influxdb, so I have to write a reporter because there isn’t any.

The reporter’s idea is probably the most complex part of telemetry. I compare Statsd and Prometheus reporters. Prometheus one does intermediate aggregation by itself, whereas statsd reporter doesn’t. With influxdb, we even don’t have different data types as we have with statsd. All aggregation in done afterwards. Intermediate aggregation looks very efficient because it is done inside Erlang VM.

So I want to create a summing counter that flushes data to influxdb on a regular basis. What is the right way to do it with telemetry? Should I write a reporter with an aggregation feature or write some external summing counter and flush its values using Telemetry.Poller and then write a telemetry reporter that writes data to influx without aggregation? In the second case, the new reporter should treat all types of telemetry metrics mostly the same. In the first case I need to specify some time interval when starting the reporter.

Phillipp · October 16, 2019, 8:32am

There is one, but I am not sure if it fulfills your needs.

rawkode · October 16, 2019, 8:35am

Hi,

I work at InfluxData, the company behind InfluxDB.

We’re actively looking at how we can contribute to the BEAM ecosystem, so it’s great to see some people working with this stack use InfuxDB.

I generally advise avoiding pre-aggregated metrics, but obviously there’s a trade off and sometimes it’s the best route forward.

You can do this currently with Telegraf, which can proxy InfuxDB; this would allow you to do aggregation without code.

I’ll put an example together if you think it would be useful

LostKobrakai · October 16, 2019, 8:43am

I guess the solution for doing it within the telemetry ecosystem would be have the telemetry reporter do aggregation/sampling. Afaik it’s the place meant to handle conversion between reported metrics and what’s actually send out to elsewhere. This can be as easy as just forwarding data or be quite complex by doing pre-aggregation or sampling. In the end it depends on how many moving parts one likes to have inside and/or outside of telemetry.

arkgil · October 16, 2019, 9:10am

Yes, a proper place would be a reporter if one wants to integrate with Telemetry.Metrics, which provides nice abstraction over how events should be aggregated.

The InfluxDB reporter mentioned above takes a different approach where all events are pushed directly to InfluxDB, something that @rawkode suggested. The benefit of doing that instead of pre-aggregating is that we don’t need to know in advance what aggregations we’re going to run in order to analyze the data. As always, there are tradeoffs: pushing every event might be very bandwidth consuming, but aggregating in-process may consume considerable amount of memory.

hauleth · October 16, 2019, 9:13am

This is the difference between logs and metrics, more can be read in this article by Grafana

Ludwik · October 16, 2019, 9:21am

As an author of mentioned library I confirm what the guys say;
InfluxDB reporter simply pushes the events and it’s up to InfluxDB’s user what kind of processing he’d like to apply on top of them. The idea of sampling was actually the next step for library’s improvement.
If you have any questions/suggestions for the implementation, I would be happy to help

tristan · October 16, 2019, 3:10pm

Is InfluxData involved at all with https://opentelemetry.io/? The first draft of the metrics specification is being finished up now I believe.

If you are involved in OpenTelemetry or plan to support it then I think the best way for InfluxData to contribute to the BEAM ecosystem is through the Erlang/Elixir libraries, https://github.com/open-telemetry/opentelemetry-erlang

We have a SIG as part of the OpenTelemetry project https://github.com/open-telemetry/community#erlangelixir-sdk

tristan · October 16, 2019, 3:18pm

I should also mention for those looking for how to instrument and report their metrics, the idea is you can use OpenTelemetry for recording metrics in your application and reporting them to the OpenCensus Collector, https://github.com/open-telemetry/opentelemetry-collector, and this will then report to Influx. It should also be able to receive from influx instrumented code so if you have projects in other languages already instrumented with some influx library they can report to the same agent/collector.

netDalek · October 18, 2019, 12:05pm

Thank you for describing this reporting scheme. It was the first time I’ve heard about OpenTelemetry. The whole infrastructure looks very interesting but maybe a bit far from my today’s needs. But I will keep an eye on it

hauleth · October 18, 2019, 12:11pm

OpenTelemetry is actually pretty big project that is backed by CNCF (known from also backing k8s, Prometheus, Fluentd, and a lot of other projects) so I would say it is worth trying (soon, as this is still in the progress).

netDalek · October 18, 2019, 12:39pm

Yes, these tradeoffs are actually a question. Anyway, we shouldn’t send raw events to influxdb directly. Using telegraph we can send them through loopback interface using UDP. This way bandwidth isn’t very important. And then it is up to telegraph to aggregate and resend metrics to influxdb server.
Even with this case collecting metrics with something like ets:update_counter inside reporter should be more efficient.
So we have two variants with their pros and cons.

Antikythera · November 19, 2019, 4:59pm

Current solution I tried is to use statsd through telegraf . That works, kinda… You end up creating measurements for everything instead of combination of measurements with fields and tags, etc… This reduces the ability to query that data by a big margin.
If there is a plan to release the native reporter in the foreseeable future, that would be awesome . Otherwise we need to rethink our metrics storage.
Are you guys planning on releasing a official telemetry reporter for InfluxDB?

hauleth · November 19, 2019, 5:39pm

You are aware that StatsD input plugin have a lot of configuration options, and allows you to:

Use DataDog-like attributes
Use Templates to split metrics into names and metadata

Antikythera · November 21, 2019, 12:11pm

Wow I wasn’t aware of the templates, thanks! I managed to make it behave now, but still it would be nice to have a native reporter. Then there’s no need to maintain all possible templates and you can use it the way it is without thinking about if it’s going to match the right templating rule in telegraf.

hauleth · November 21, 2019, 1:15pm

IIRC StatsD exporter for Telemetry supports DataDog-like tags, so with proper configuration the Telegraf will be just translator from one syntax to another.

juanpabloaj · June 3, 2020, 2:26am

have you ever tried telemetry_influxdb?

Sanjibukai · October 13, 2020, 12:36pm

Does anyone heard of fluxter https://github.com/lexmag/fluxter ?
I found that blogpost https://tech.forzafootball.com/blog/gathering-metrics-in-elixir-applications that introduce it (article from 2016).
So I wanted to ask if there is now (in almost 2021) some kind of consensus on how to do some monitoring in Elixir?

hauleth · October 13, 2020, 1:49pm

There is EEF Observability WG that is working on monitoring facilities in Erlang (and naturally - Elixir). The current consensus is to use telemetry as event dispatcher that is backend agnostic. On top of that you can use any metrics gatherer you like, for example mentioned above telemetry_influxdb. If you want to have more “holistic” solution for monitoring you applications, then you can check out opentelemetry application that will provide you metrics and traces, in future maybe even logs, gathering together with tooling to dispatch that data to various storage and processing engines.