OpenTelemetry and timeouts

Graborg · March 20, 2024, 1:33pm

Hi!

I’ve been digging into OpenTelemetry in the context of Phoenix lately. One problem I’ve found is that if a client closes the connection early (e.g because they call us with a timeout lower than we can deliver a response) we never call .end on the child span where we’re at in the code at that moment (since the process handling the request is terminated). We thus only get the root span but not the child span.

More concretely, this happens, for example, in the opentelemetry_finch library. Thus if we make a http call as part of a request, and that http call ends up taking longer than the client that calls us can handle, opentelemetry_finch never reports us their span.

I’m not sure if this is the intended way since OpenTelemetry is still in an early phase, but it seems suboptimal because the most important traces for us is the ones taking so long that they time out.

The question is where this problem should be fixed. Should the creation of spans happen in their own proccesses so that they get to call end on the span?

Or is there another way to solve this? I usually don’t work on this level of abstraction but curious to learn more.

Thanks