What do you use to monitor your Elixir app?

Im coming close to deploying my app soon :smiley:

I can imagine things like seeing a realtime view of the number of processes, exceptions and whatnot would be really helpful to see.

I know there are some extremely bloated tools for ruby, Iā€™d rather avoid stuff like that.

So what are your ā€˜must haveā€™ monitoring / performance tuning / metrics libraries?

Iā€™ve been using Appsignal https://appsignal.com/

The product itself is excellent for both exception and performance monitoring, and the custom metrics are handy too. Itā€™s also well priced too- switching over from New Relic saved us a lot of money.

Iā€™ve pestered the Appsignal team plenty of times for help (or to obnoxiously request features!) and theyā€™ve always been extremely helpful and responsive. Very happy with the service.

It currently has a bug where with cowboy 2 it reports 404s as errors, but that should be fixed soon.

5 Likes

I use prometheus for metrics. And an ā€œin-houseā€ contraption on top of tantivy for logs / exceptions.

2 Likes

@idi527 Any plan to opensource the tantivy contraption?

1 Like

Currently we use a combination of statsd for raw metrics and open tracing through spandex: https://github.com/spandex-project. All of this gets sent over to datadog. In the past Iā€™ve used prometheus and grafana and both are good. I actually prefer prometheus to statsd but both are fine. I donā€™t think I could live without open tracing at this point. Its probably not necessary if youā€™re just running 1 or 2 services. But if you end up with a lot of different services its pretty invaluable for observability.

Off the shelf apm tools have the most strengths when you only need to monitor / observe 1 application or service. As systems grow I tend to want more control over my monitoring and alerting rules and a lot of off the shelf apm tools donā€™t give you that sort of power. In the end I prefer tools like datadog or grafana.

3 Likes

It would need some cleaning up, but yes, I can try. It is somewhat similar to https://github.com/KodrAus/tantivy-log

Thank you so much for tantivy!

2 Likes

Weā€™re using a combination of the following for our elixir services:

  • Statsd via the statix library for stats and metrics
  • Bugsnag via the bugsnag elixir library for capturing exceptions
  • Sending logs to an ELK stack via logstash-json. Weā€™ve started using elastalert to trigger alerts on some specific errors in our Kibana logs. Itā€™s nice to be able to configure alerts on existing logs, without touching a line of code

We also built a ā€˜synthetic monitoringā€™ service that we built in-house to smoke-test our API every 5 minutes. It raises alerts in OpsGenie though HTTP calls if any or or endpoints return unexpected responses.

2 Likes

These are all good suggestions. Additionally we use:

Iā€™d use (or pick) something to store metrics in that can provide visualizations. Start small and pump some general data into that store. Add to it over time. We use telegraf currently for that.

3 Likes

I have been using Statix with Elixometer which all gets sent to Datadog (via their agent/dogstatsd), where we have our monitoring dashboards.

Also Honeybadger for error monitoring, and VictorOps for alerting.

However, just yesterday, I replaced Elixometer with vmstats in one of our applications (following a similar strategy to the one laid out in this blog post). I think I will be using vmstats going forward. It was much easier to set up than elixometer and requires much less configuration.

4 Likes

Over at New Relic, we use the open source agent we have built. It comes with Plug transaction tracing, distributed tracing for micro-services, errors, BEAM stats, function tracing, custom attributes, alerting, etc.

In progress are a few framework integrations - Phoenix and Absinthe. Long term we want to align with the Elixir Telemetry project so we donā€™t need a bunch of vendor specific instrumentation packages in the ecosystem

6 Likes