Distributed tracing in elixir

With the proliferation of microservice architecture, distributed tracing has become more and more important in recent days. Because the landscape is quite new, there are multiple open standards like opentracing, opencensus and multiple opensource implementations like zipkin, jaeger, commerical implementations like datadog, newrelic, elasticsearch etc. There are also multiple client implementations available for elixir language like spandex, ex_ray and opencensus

The problem is, as a user, I am stuck with analysis paralysis. I have been postponing the adoption, because I am afraid of making the wrong bet. With the current setup, switching from one implementation to another would be a time consuming process (depends on how big the codebase is).

beam telemetry project aims to solve the same issue in the metrics landscape. If it gets implemented by the majority of the community, then switching from let’s say graphite to prometheus would be quite straight forward.

Is there any similar effort for distributed tracing?

2 Likes

Note that OpenCensus is the only actual standard for what has to be defined to really have distributed tracing. OpenTracing leaves almost everything up to the implementation, which does not produce compatible libraries. OpenCensus provides complete specs and implementations for distributed tracing across multiple systems. This also involves some W3C standards, https://www.w3.org/TR/trace-context/ and https://w3c.github.io/correlation-context/

OpenCensus and OpenTracing will ultimately be merged and the result live under CNCF. So choosing OpenCensus is a safe bet.

Also note that we who work on OpenCensus have been discussing with those who work on Spandex to combine efforts.

I hope the OpenCensus library can become the “telemetry library” of distributed tracing on beam :slight_smile: – we’ll also be integrating with telemetry for statistic reporting.

5 Likes

This is really cool!

Originally I thought the questions was regarding Trace Tool Builder. :slight_smile:

1 Like

Could you explain how the integration will work with other libraries like ecto, redix, hackney etc.

We have started to enable distributed tracing for our ruby applications (using newrelic). It’s mostly plug and play so far, we just have to enable the config and all the common things are already instrumented properly. I understand this is mostly because of the way ruby works, it’s easy to monkey patch existing libraries.

The intent is to be mostly plug and play, but libraries are only instrumented when someone who needs it decides to do the work and release it.

So right now plug and tesla can have tracing enabled in a fairly simple way of adding middleware/plugs https://github.com/opencensus-beam/

Instrumenting hackney isn’t really possible without internal changes to hackney itself (it has no middleware support) and I’ve been talking to Benoit about it but we haven’t moved forward with any change yet.

2 Likes

We (well, mainly Benoit ofc) are integrating OpenCensus and Barrel though.

I’ll echo what @tristan said, that we’re working on how the Spandex team can integrate and/or combine with the OpenCensus-BEAM project, because I think that’s the spec that makes the most long-term sense. I’d say that if you’re looking to add something to an app today, maybe try adding OpenCensus and file issues for any trouble you having (both documentation and technical issues), so we can figure out where the gaps are.

I still haven’t taken the time to figure out what that experience is like myself from an Elixir app, but if decide that you’d rather use Spandex, we have a fully-worked example that you can use to see what’s required to integrate Spandex with the commonly-used frameworks (Plug, Phoenix, Ecto, Datadog). Check out the links to the diffs in the README to see the exact changes you’d need to make in your app. Spandex only supports Datadog though, so that might be a deal-breaker for many environments. It wouldn’t be hard to add a new back-end, but no one has contributed one yet. We currently use Spandex at Bleacher Report, and it interoperates seamlessly with Datadog’s official clients for Ruby, NodeJS, .NET, etc.

2 Likes