We’ve been using AppSignal for a while but we’re now looking for an alternative. I’m getting an impression that rolling something custom is getting easier now with Telemetry, but I’m looking for a (mostly) plug and play solution that captures exceptions and gathers performance data. We can allocate time to roll out something on our own but I would leave that as the last option.
I know of a few services myself, but I’m curious to find out what are people’s recommendations?
I have been using Sentry and Honeybadger with varying amounts of success. They are okay, not amazing but get the error reporting job done. Not sure they even can report performance metrics.
NewRelic seems to be the norm for many. It’s very feature-rich.
If you’re on Heroku their management tools are quite nice.
I will simply say that we’re not happy with the quality of the product and particularly with their customer support; that said, the service is very cheap (probably the cheapest). If you need more details, send me a private message and I can share my personal experience.
I usually end up hosting my own instances of Prometheus and Grafana and then use https://github.com/deadtrickster/prometheus.ex along with all the relevant collectors for out of the box Phoenix+Ecto metrics. I then also add my own metrics when appropriate to keep track of business metrics and what not.
From a logging perspective, we’re pretty much on par with the typical solutions out there. From the metrics side, you just log structured data and create dashboards from that.
I’d love to give you a rundown. If you’re interested, perhaps you guys can play a role in shaping the future of the product. I’m working closely with engaged early adopters.
Hey @svilen, did you find out something interesting? I am quite interested about your findings and decision, as we’re also trying to setup some metrics collection & visualisation and also exceptions tracking.
To be honest, I am very confused by the whole logger, telemetry, opentelemetry, prometheus etc. topic, so I am looking for any useful information.
logger is new Erlang module in kernel application that is meant as a general-purpose logging frontend. Right now (Elixir 1.9.x) its messages are intercepted by Elixir’s Logger application and are displayed alongside its messages. In next release (Elixir 1.10) it will slightly change, as Logger module will now be wrapper over Erlang’s logger and these two will be unified (with compatibility layer for legacy code).
telemetry is simple events dispatcher, think about it as a message bus, that is meant to dispatch events about applications. In theory we could use logger for that, but telemetry is much simpler and less complicated solution (no levels, no formatters, no filters, etc. just messages and handlers)
OpenTelemetry is CNCF project which is meant to bring iter-technology solution for monitoring (metrics and traces, no logs). The idea is to make it as a backend to the Erlang’s telemetry, so you will be able to seamlessly introduce it into your project and most of the things will work almost OotB.
Prometheus is another CNCF project that is meant to be TSDB and query language for metrics.
So in the end:
If you are writing library - logger and telemetry are your friends
If you are building application - OpenTelemetry is something you should look into
Prometheus is for operations, there are backends for telemetry and I am pretty sure that there also will be such feature in OpenTelemetry, so in the end, OpenTelemetry will be all you need. However currently the OpenTelemetry Erlang is still in pre-alpha phase, so if you want to use it here and now, then you should check OpenCensus (which is previous project, and all the people that were working on that one currently are working on OpenTelemetry).
About error tracking - check out Sentry, it has official support for Elixir and it is pretty decent software. I have used it in the past and it was bliss.
About visualisation - Grafana is the best, unless you want unified hosted solution, then most of the above can be achieved via DataDog.
Not sure I understand this still. So is telemetry sort of like a lot of people use Redis, RabbitMQ, Kafka? Or Java’s JMS implementations? Basically a stream of events, sort of like a GenServer’s mailbox?
The easiest way that I can explain telemetry is that it is a function dispatcher. Libraries emit events with a given name (list of atoms), measurements, and metadata. Functions are then invoked if they subscribed based on the name. The function dispatcher is doing things like handling errors, so a bad function won’t be invoked again and again.
The functions can do anything you want, and they will be called inline. Many people do things like emit to StatsD or Prometheus at that point. It’s completely in Elixir though. Further, the functions are called inline so it’s not async by default either.