What are the best practices for persisting telemetry data?

trisolaran · August 5, 2021, 8:28pm

I have a cluster with a number of phoenix applications. I would like to have a centralized repository where my applications can send telemetry events to and where the telemetry data can be persisted. Also, I’d like to have a tool with which I can easily plot and inspect metrics derived from this data (e.g. number of requests per seconds, duration of each request, the typical stuff)

In some of my past (non-Elixir) projects applications were sending measurements to a Prometheus server and the metrics were visualized using Grafana.

My question is: what are the standard tools for achieving the same result in Elixirland quickly and writing as little custom code as possible?

stefanchrobot · August 5, 2021, 8:44pm

The whole idea of telemetry is that you can plug in any data sink. Have a look at something like telemetry_metrics_prometheus. Never used it, but seems you should be able to pick it up pretty fast.

wolf4earth · August 5, 2021, 8:45pm

I haven’t used any of this in action but it might be worth to look into the ecosystem around OpenTelemetry which includes collectors which can publish to a variety of backends, among them Prometheus and others.

I know that this isn’t exactly what you were looking for but I felt it could be helpful to point it out, if not to you then maybe to others.

tejpochiraju · August 6, 2021, 12:29am

I think PromEx is a good choice.

trisolaran · August 6, 2021, 8:03am

Thanks a lot @stefanchrobot, @wolf4earth and @tejpochiraju for your answers! They’ve all been useful to me

Apparently, there is no “Elixir replacement” for Prometheus, which is what I initially suspected. And it makes sense, no point in reeinventing the wheel.

telemetry_metrics_prmetheus looks like a super quick way to expose a /metrics endpoint for Prometheus to scrape, PromEx does pretty much the same thing AFAICS but it adds a custom Grafana dashboard that you can import. Sweet. Thanks for the pointers, now I know what I need to do

tangui · August 6, 2021, 9:08am

Another option, especially if you have many Phoenix apps or if you dynamically launch them, is to write to a central metrics repository such as statsd_exporter and then have prometheus scrape it. I remember using statix on Phoenix side to send metrics.

Basically you have this:

+-------------------+        +---------------------+                        +--------------+
|  Phoenix + statix |------->|   statsd_exporter   |<---(scrape /metrics)---|  Prometheus  |
+-------------------+        +---------------------+                        +--------------+

with as many Phoenix instances as you need.

stefanchrobot · August 6, 2021, 9:13am

What’s the difference between Telemetry and OpenTelemetry? Should the Erlang ecosystem adopt OpenTelemetry instead?

wolf4earth · August 6, 2021, 9:53am

My understanding was the Erlang OpenTelemetry bindings can be used to export Telemetry events but maybe I’m wrong.

LostKobrakai · August 6, 2021, 10:35am

Integration and adoption is already happening:

But also telemetry can be a low level detail to fetching the data needed for open telemetry as far as I understand it. They’re not competing.

trisolaran · August 6, 2021, 12:48pm

Interesting. I believe this would be overkill for my current setup, but it’s something to keep in mind.

tristan · August 8, 2021, 6:06pm

Note that OpenTelemetry Metrics is still very early, so if you are mainly looking at exporting metrics like with prometheus you will want to use something else unless you are ok with lots of breakage and having to dig in to the code

And yes, telemetry the library has a much more narrow purpose, it can be integrated with OpenTelemetry through handlers.

akoutmos · August 9, 2021, 8:11pm

Just want to clarify a few things regarding PromEx and telemetry_metrics_prometheus. PromEx is very much built on top of a bunch of the beam-telemetry · GitHub organization projects and does not intend to “compete” with any of them. That includes telemetry_metrics_prometheus, telemetry_poller, telemetry_metrics, and of course telemetry itself. telemetry_metrics_prometheus provides the lower level functionality to declaratively generate Prometheus formatted metrics from telemetry events, without providing any opinions as to what these metrics look like for each of the telemetry compatible libraries.

PromEx effectively bundles all of these libraries together and provides a turn key metrics solution with my opinions as to what telemetry events are captured along with what tags for each of those metrics (primarily to normalize high carnality fields and to provide reasonable data point enrichment without blowing up Prometheus with too many data points). PromEx currently supports quite a few Elixir libraries like Phoenix, Ecto, Absinthe and a few more and also provides Grafana dashboards for each of those libraries.

If you do decide to give PromEx a test drive, I would interested in getting feedback :).

trisolaran · August 10, 2021, 7:41pm

Thanks @akoutmos. I now better understand the difference between PromEx and telemetry_metrics_prometheus.
When I add monitoring to my services, I’m sure gonna give PromEx a try and give you my feedback.

trisolaran · June 18, 2022, 3:30pm

Hi @akoutmos! Here I am after having added PromEx on my application in a Kubernetes cluster, and I have some feedback. My application uses Ecto, Oban, PhoenixLiveView, and Broadway.

First of all: congrats for pulling off such an ambitious project! It really feels like a lot of work, especially writing the plugins for all those libraries. What I absolutely love about PromEx is that it can automatically upload the dashboards to Grafana, this is really a powerful feature. You deploy your application to a new environment and bam, the dashboards are there! Same if PromEx dashboards are updated. Great idea.

Because the automatic provisioning of the dashboards is IMO such an awesome feature, I think it should be mentioned more explicitly in the docs. Initially I was thinking “how am I gonna get these amazing dashboards into Grafana?”. After looking at the config values and through other hints in the docs, I eventually realized that they can be automatically uploaded at application startup. In my opinion this should be mentioned explicitly here: README — PromEx v1.7.1
I wrote a PromEx plugin to expose my application’s specific metrics, and then I wanted to create a dashboard for it and upload it automatically just like PromEx’s default dashboards. The Plugin guide doesn’t mention a way to do that, so I had to figure it out by myself. I ended up adding {:my_app, “path_to_my_dashboard”} to the PromEx.dashboard/0 callback. My opinion is that this callback should be part of the PromEx.Plugin behaviour, so that each plugin can expose its own dashboard(s) and tell PromEx where they are, without listing all of them in the main PromEx module.
More importantly, I think it would be great to provide a workflow to easily write dashboards templates for a plugin. For my plugin, I put together the dashboard on Grafana, then I downloaded the JSON definition and I manually edited it to create a “json.eex” template. It would be great if we could find a way to automate this process. Something like:
- Put together the dashboard on Grafana
- Run a mix task that fetches the dashboard from Grafana via the API and turns it into a “PromEx” template
- Being able to run the same task after I update a dashboard on Grafana
It’s not very easy to get the Broadway plugin to work with some real-world producers. The documentation suggests to overwrite the message’s acknowledger through a transformation. While this allows PromEx to see the messages, it also removes the original message’s acknowledger, which is needed by some producers (for example AWS-SQS) to receive confirmation that the message has been handled. The issue is explained here, towards the end and the solution proposed by @dsschneidermann finds me in agreement. I’d love to hear your input and I’d be willing to help with a PR
Final minor nitpick: the dashboards are using the old Graph panel instead of the new Time series panel. Maybe it’s worth to migrate them.

I hope you found at least some of this feedback helpful. Let me know what you think and how I can provide further help. Thanks again for creating PromEx!

akoutmos · June 19, 2022, 7:21pm

Thanks! It was and remains a lot of work . Glad that it is making things easier for you from a metrics standpoint.

That is a good point. I can probably make this kind of stuff more apparent in the README. I need to work on my copywriting and marketing skills lol. I’ll try and update the docs this week. I will also include my Code BEAM talk in the doc: https://www.youtube.com/watch?v=0SkVsUdUutE

I have been meaning to write some more docs/blog posts around this, but like many things limited time is the killer. More on this at the end of the post.

100% agree with this one. I feel this pain every time I create 1st party dashboards for PromEx. Grafana has also released a lot of new tools and updated Grafana quite a bit since this project started. There is a lot that can be done here now that things have evolved.

Unfortunately I don’t have any production applications where I use Broadway at the moment (I used to), so it is tough for me to feel those production pains. Hopefully people that are using Broadway+PromEx in production can contribute back to the project and help iron out some of these pain points.

This is on the todo list especially now with Grafana 9 being released.

Thanks for the feedback @trisolaran. I genuinely appreciate all the feedback :). I will also add some additional commentary here to provide some context as to when some of these issues/features may be addressed.

Recently, a lot of my spare time outside of FT work, consulting and running a business has been spent on two projects. Those being Elixir Patterns and MjmlEEx. MjmlEEx came about because of some business needs that I had while bootstrapping my business https://eaglemms.com/ (funny enough, PromEx is also open sourced work from the same bootstrapped business haha). MjmlEEx is luckily at a point where it is stable, and has all the features that I wanted out of the library and so I don’t foresee a lot of active development there features wise. I do plan however on creating a Mjml EEx Pro offering that will provide some pre-created email templates as well as some other goodies .

So am I brining this up you may be wondering? To be completely transparent, my hope is that I can follow a similar path with PromEx and create a PromEx Pro offering so that I can buy some of my time back from other ventures and instead devote more time to open source work. There are plenty of stories all over the internet of open source maintainers burning out as a result of not finding a good balance between open source and paying the bills, and I really hope that I never have to write a blog post like that. Hopefully by following in the foot steps of people like Adam Wathan (Tailwind and TailwindUI), Caleb Porzio (great story by him on funding open source) and our very own Parker Selbert (creator of Oban and Oban Pro) I can find a way to devote more of my time to open source initiatives. This is not to say that PromEx is a dead project and that I won’t address the items that you listed (far from it). But rather to say that my development on the project varies depending on external circumstances and funding the project may help increase & normalize the rate at which I can deliver features in the PromEx project.

In addition to those open source balance kind of concerns, there are also some other factors at play that have contributed to me slowing down slightly on PromEx (but this is changing soon). Specifically, there has been a lot that has changed with Grafana as of late and in very good ways. Grafana has recently released a whole slew of open source tools that I would like to leverage more of in future versions of PromEx like their dashboard linter and Tempo. But I want to be a bit cautious here and not adopt these tools prior to them maturing a little. In addition, there has been a lot of developments on the Open Telemetry side of things and I want to make sure that I can properly incorporate that work into PromEx.

To perhaps give people a sneak peek as to what I have planned for PromEx in the near future. I am experimenting with getting PromEx to the point where it incorporates both metrics and traces into a single library so that with the same development experience that you have today with PromEx (i.e creating a single module and adding 1 thing to your supervision tree) you can have metrics and traces as well as exemplars so that you can correlate metrics back to traces (Grafana blog post of metrics/trace correlation).

All this to say that I really appreciate your feedback and will definitely open up some GitHub issues from what you shared to make sure that I don’t forget some of the points that you brought up and that I will hopefully be addressing many of these in the coming months!

trisolaran · June 20, 2022, 12:50pm

Thanks @akoutmos for the nice and extensive response! Your plans make a lot of sense, a PromEx Pro (but isn’t that too many “Pro” in the name? Anyway, marketing issue ) sounds like a promising idea, I hope you can soon find the time to work on it.

Regarding Broadway+PromEx, I’m one of those people who are using both in production (well, not yet, strictly speaking, but it’s currently running in a stage environment very similar to our prod environment) so I’d be happy to contribute. Just chime in on that github issue whenever you find the time. It’s not blocking me in any way, but since a solution has been found, I’d be happy to share it with a PR.