Distributed tracing to Datadog with Spandex

Raising the question here to hopefully gauge how others are doing this. We are running a large Elixir application which spawns a GenServer for each customer that visits the website. When a request to that server’s public API comes in through Live View, it dispatches a GenServer.cast/2 to the action that was requested. We then dispatch a message to the view if anything changes by using PubSub. Fairly standard stuff.

So we have recently installed Datadog APM into our Docker containers and when inspecting the traces it seems like we have spans all the way up to GenServer.cast/2. Immediately after GenServer.cast/2 there is no more spans.

I’m wondering if because we use horde under the hood to distribute these servers throughout our cluster if we are missing a step. I’ve passed the trace and span identifiers using the %SpanContext{} example in the Spandex Datadog repository but I still can’t seem to get the trace to show up. Here is the rough event flow:

Live view handle_event → GenServer public API → GenServer private API

I can’t seem to find any literature online for this. Has anybody setup distributed tracing with Spandex to Datadog with success? Looking for some ideas!

:pray:

1 Like

If anybody has an issue with this in the future I finally found the root cause! So I must not have fully understood how Spandex worked. What ended up happening was the GenServer process never fully closed a different trace so my private API call was on a different trace than my public API call.

I was able to diagnose that by logging out the trace_id on both sides of the GenServer.call/3 and making tweaks until I could get them both to be the same. Works like a charm now!

1 Like