Cost of stack traces for logging?

halostatue · July 5, 2022, 12:49am

We were having a discussion at work about logging:

I was thinking about the pros and cons of adding stack traces to log metadata last week, and the difference between the messages we log for helping operate the system and the messages which get presented to the real end user (see kevlinhenney on twitter for endless examples of diagnostics ending up in user interfaces).I was wondering about having a function to automatically bundle stacktrace information into a log message so we could do something like:
# stripped down code to handle a failure # e.g. {:error, posix} = File.open(filename)
message = "Failed to open #{filename}" #{:file.format_error(posix)}
OurLogger.error(message)
{:error, message}
And OurLogger could massage and add stacktrace: Process.info(self(), :current_stacktrace) to the metadata. The log formatter could then display stack traces for error and higher?Then the application sending a result to the eventual human would have to make sure it presented an appropriate message for the user.

This sounds…possible, but it feels like it would be very expensive to achieve. Obviously, we’d never implement this without benchmarking getting a stacktrace, but what I wonder is whether cost is that high?

If it’s low, would there be a reason to add this to Elixir’s logger by default?

NobbZ · July 5, 2022, 6:36am

As implicit stack traces have been removed in favor of explicit stacktraces due to their cost, it is safe to assume that it will be far to expensive to always log them.

For errors and above though, I think it totally depends on the error. Are you really able to learn anything new from an attached stacktrace, or is File/Function metadata already enough, as there are basically only 2 or less pathes that this function will ever be called?

When you log an error or above, why do you prefer to log the error, rather than crash? Error for me means “this is a condition the process can not recover from, it will not proceed”, while “above error” for me even means “this application is unable to recover from the condition, it will shut down”.

hauleth · July 5, 2022, 9:09am

If you log anything above error and you do not crash at least process, then I think that you use logging system incorrectly.

To extend a little on levels error and above:

error - application errors, recoverable, but in higher volumes may require further introspection
critical - system has met unrecoverable error that still do not bring the whole application down, may require more immediate action from the operator
alert - do something with that error right here, right now
emergency (also known as panic) - all hands on deck, we have been thrown of course just a tad, asteroids are smashing at the hull, we are going into the Sun, and we are also out of coffee

halostatue · July 5, 2022, 2:49pm

@NobbZ The example given was illustrative only. There are plenty of cases where something is an error, but potentially recoverable—especially when dealing with external systems (this is mostly what our services do).

In general, I agree with @hauleth’s levels, although to me alert is more right here, right now.

Thanks for the thoughts on this; it confirms the feeling that I had on the cost of stacktrace acquisition.