In my library https://github.com/zachdaniel/spandex, I aggregate latency data in the form of spans using the process dictionary. As the process works it could be generating hundreds or even thousands of spans, depending on the usage. Currently it just keeps track of them in a list, and sends them when the entire trace/span is complete.
I’d like to get an intuition about how these spans (relatively small maps, with an occasional large string value) affect the memory footprint of a process. Additionaly, what kind of tradeoff should I expect if I were to look at periodically shipping spans in the middle of the process? The naive approach would be to have a threshold, say 20, of completed spans that warrant sending. Then, if at any point I have 20 or more completed spans I ship the ones that are completed. This lowers the memory footprint within the process, but periodically shipping the data requires me to send it to another process. This is the case because I can’t block the current process waiting on those spans to send (would be a horrible tracing library if the trace blocked to make network requests in the middle, I imagine) so I have to start another process to do it. Additionally, if sending periodically is the sensible solution, is spawn(fn -> do_work end)
enough?