This is surprisingly my first topic despite lurking for a very long time.
I find myself wanting to add tracing and some form of Application Performance Monitoring (APM) service. The complication is that we are a healthcare company and subject to regulation (the HIPAA Act) that limits what services we can use. For example NewRelic is not really an option. DataDog is a “maybe” but their pricing is a bit expensive for them to sign the contracts required for the regulation.
We already have a contract with Sentry, and use them for their originally advertised purpose of error monitoring, but they’ve “recently” added general Application Monitoring with tracing support that seems to follow the OpenTelemetry spec. The “official” sentry library for Elixir does not have support for this or the “new” way Sentry collects data, and doesn’t seem to have much movement going on there.
We also use GCP (Google Cloud) and have the contracts needed for them. They have an OpenTelemetry-based product called Cloud Trace (formerly Stackdriver?).
In both cases there aren’t really any Spandex adapters beyond what I’ve seen for DataDog, although it looks like there’s some auto-generated GCP things for Cloud Trace in their elixir-google-api library
Has anyone seen any Spandex adapters for either of these? Or suggestions for a way to get an APM-like experience? I know about prom_ex since I used to work with akoutmos, but we’re a pretty small team and I’d like to try for a paid service first before spinning up Prometheus and Grafana for this. I’m also not opposed to writing the adapters, it just becomes a harder sell, and I’m not sure yet what Spandex gives you as far as process tracking, and what I’d need to build myself.
I am currently working on incorporating GrafanaAgent into PromEx which opens the door for leveraging GrafanaCloud for example to host both Grafana and Prometheus. No need to have Prometheus poll metrics over the public internet :). GrafanaAgent pushes the Prometheus metrics via remote_write. Currently using the experimental version at my work with GrafanaCloud and it is working beautifully. Doesn’t immediately address your tracing dilemma…but it’s something hah.
Ok, after reading up more, I think I maybe answered my own question:
:telemetry is not built with open-telemetry in mind. It’s just a pub-sub-like setup for emitting events to handlers
Spandex looks like it is roughly close to the open-telemetry spec but is not explicitly built for it, does not consume :telemetry information, and at the moment only has an adapter for DataDog.
:opentelemetry and :opentelemetry_api are built with the open-telemetry spec in mind by that community and is in beta for tracing, but it looks like not all of the opentelemetry plugins/handlers/etc use :telemetry, only the ecto and phoenix one.
So I guess a more accurate question would be:
If I wanted to do tracing part of an APM for Google Cloud or Sentry, which both supposedly support OpenTelemetry, does it seem like Spandex or opentelemetry would be the right way to go?
open_telemetry doesn’t have to use telemetry. As you said, telemetry is only PubSub and anyone can hook into it to publish events, including open telemetry.
I just saw this post and though you may have made a decision already I wanted to mention that an option for your particular case may be Splunk APM, Introducing Splunk APM | Splunk – especially because of your HIPAA requirements.
Full disclosure: I work for Splunk on OpenTelemetry.
If anyone comes across this later on, open_telemetry is the route we’ll likely go. New Relic is now offering HIPAA compliant services, but their minimum spend seems to be $USD25k/yr once we actually got someone knowledgeable about it.
Cloud Trace should work, you’d just need to either run the collector yourself, or figure out if the stackdriver container that runs on the Container Optimized VMs is actually just a collector that you could use instead.
As far as I can tell, Google does not provide an OpenTelemetry Protocol (OTLP) endpoint but the opentelemetry collector that’s out there for Google should work with it. No idea though. I’ve only gotten as far as sending my opentelemetry traces to stdout right now.
Another option is Grafana Tempo, which would not require the collector at all, since it has an OTLP endpoint. You’d need to self-host if you have compliance needs though. It’s new enough that other cloud providers don’t offer it, so you’d need to see if Grafana, the company, would offer HIPAA compliance in their hosting
We used Google Cloud Trace. It’s fine but it’s not an APM and we didn’t feel like running Prometheus.
We were interested in using Honeycomb but the HIPAA pricing was originally very high. We checked back in with them a while later and they had revised their pricing so sane levels for our volume of data.
We’ve been happily using Honeycomb for the last 5 months or so and no longer need the otel collector like we did with Cloud Trace