Hi friends!
My company is using Datadog APM to profile traces for our Phoenix app, and we’re using a combo of spandex, spandex_phoenix, spandex_datadog, and spandex_ecto to send traces. They look great!
The trouble is that we lose spans on our traces when we use Task.async
and Task.await
for multiple parallel DB calls. In code, it initially looked something like this:
def function_making_db_calls() do
synchronous_function_calling_db()
task1 = Task.async(fn -> my_function_calling_the_db() end)
task2 = Task.async(fn -> other_function_calling_the_db() end)
# other logic
%MyStruct{
data1: Task.await(task1),
data2: Task.await(task2)
}
end
Here, the first DB call (outside the Task.async/await) traces correctly, and we can see how long it takes in the context of the toplevel request. But the ones in the other processes don’t trace correctly.
Now! I saw this tip from the Spandex README:
Asynchronous Processes
The current
trace_id
andspan_id
can be retrieved and later used (for example, from another process) as follows:trace_id = Tracer.current_trace_id() span_id = Tracer.current_span_id() span_context = %SpanContext{trace_id: trace_id, parent_id: span_id} Tracer.continue_trace("new_trace", span_context)
New spans can then be logged from there and sent in a separate batch.
So I did something like this, but to no avail:
defp trace_aware_async(span_label, func) do
trace_id = Tracer.current_trace_id()
span_id = Tracer.current_span_id()
span_context = %SpanContext{trace_id: trace_id, parent_id: span_id}
Task.async(fn ->
Tracer.continue_trace(span_label, span_context)
func.()
end)
end
def function_making_db_calls() do
synchronous_function_calling_db()
task1 = trace_aware_async("task1_span", fn -> my_function_calling_the_db() end)
task2 = trace_aware_async("task2_span", fn -> other_function_calling_the_db() end)
# other logic
%MyStruct{
data1: Task.await(task1),
data2: Task.await(task2)
}
end
But still no dice
I started digging into the sources and may continue to do this, and this person seems to have printed logs to find/correct discrepancies in the trace_id
, but I’d rather not trouble my teammates with multiple PR reviews and monitor so many deploys. Has anyone else had experience with this, or sees what I’m doing wrong?
Thanks so much!