Docs say not to use Logger.flush/0 in production - what's the downside?

japhib · December 4, 2024, 5:47pm

We’re in the process of adding additional error logging for unhandled exceptions in a Phoenix application. One of the issues we’re experiencing is that because the unhandled exception is crashing the request process, we sometimes lose unflushed Logger messages.

To solve this, I’d like to use Logger.flush/0. However, the docs say:

This is useful for testing and it should not be called in production code.

What exactly is the downside of calling this in production code?

We really want our logs to be flushed in these error cases, and we’re okay with the process blocking for a bit until that happens. But is there some other downside of this function that would make this undesirable to call in our case?

ruslandoga · December 4, 2024, 5:58pm

Are you sure this is what’s happening?

I tried crashing a process by raising an exception right after logging and it works as expected:

iex> require Logger

iex> spawn(fn ->
...>   Logger.info("info")
...>   raise "oops"
...> end)

00:58:27.917 [info] info
00:58:27.919 [error] Process #PID<0.109.0> raised an exception
** (RuntimeError) oops

Maybe your system is getting overloaded?

ruslandoga · December 4, 2024, 6:03pm

Also looking into the actual implementation of Logger.flush/0 it seems to be no-op by default: elixir/lib/logger/lib/logger.ex at 78f63d08313677a680868685701ae79a2459dcc1 · elixir-lang/elixir · GitHub – because the default logging destination for logger_std_h (which Logger uses) is starndard_io and not a file – but I haven’t actually verified it, just took a look at the code

japhib · December 4, 2024, 6:35pm

I tried crashing a process by raising an exception right after logging and it works as expected

I don’t think your test is valid. The docs say that Logger alternates between sync and async modes to remain performant.

There is a very large difference between spinning up 1 process and logging 1 message in an IEX shell, and a production Phoenix server handling hundreds to thousands of requests per second, each of which is potentially creating dozens of logs. In our case it seems likely that it is running in async mode, while in your IEX session it’s probably running synchronously.

Logger.flush/0 seems to be no-op by default

Seems like you are making some assumptions about which logging backend we’re using. We are not, as assumed, logging only to stdout, but to a log file.

I guess I should specify also that we’re using Elixir 1.14.5 and OTP 24. FWIW, here’s the Logger.flush implementation for Elixir 1.14. I am very skeptical that Logger.flush/0 is a no-op.

ruslandoga · December 4, 2024, 6:42pm

In our case it seems likely that it is running in async mode, while in your IEX session it’s probably running synchronously.

In the current Logger implementation it’s the other way around

The mode is async until back-pressure kicks in, then it becomes sync.

I am very skeptical that Logger.flush/0 is a no-op.

I don’t have Elixir 1.14.5 and OTP 24 installed right now, but I think you could try this:

spawn(fn -> Enum.each(1..1000000, fn i -> Logger.info("#{i}"); Logger.flush() end) end)

In the current version it doesn’t print all 1..1000000 numbers.

martosaur · December 4, 2024, 7:09pm

Note that logging has seen some changes in Elixir 1.15, so if possible, you might want to upgrade and see if the problem still exists.

japhib · December 4, 2024, 7:39pm

Thanks for the suggestion. But I’m mainly asking what problems are being warned against by this statement in the Logger.flush/0 documentation:

This is useful for testing and it should not be called in production code.

Note that this warning exists in the Logger docs for Elixir versions 1.14 through 1.17 (and current development branch).

hauleth · December 4, 2024, 8:18pm

The thing is, that it may cause performance degradation, and in general it is pointless to use in production.

japhib · December 4, 2024, 9:38pm

it may cause performance degradation

What does “may” mean? In what situations? How much and what kind of performance degradation?

If it’s just a bit of slowdown in the process calling it, that’s probably acceptable. I’m only talking about using this in an unhandled exception logger, which is less than 1% of total requests.

in general it is pointless to use in production

What about in my specific scenario, i.e. ensuring logs are flushed before a process crashes?

josevalim · December 4, 2024, 10:42pm

If your process crashes, it will be logged anyway, Logger.flush won’t make any difference. Except if there are too many log entries, then the logger may discard some, but Logger.flush won’t change it.

josevalim · December 4, 2024, 10:52pm

The reason why it is not encouraged is because it tells all loggers to stop what they are doing and write to disk. If you call it several times, they will stop and write to disk, every single time, even though they believe batching and waiting is the efficient thing to do.

sodapopcan · December 4, 2024, 10:57pm

Asa generally huge fan of Elixir documentation (if that’s not obvious from my comments here over the years) OP’s sentiment is actually one I have felt quite a bit. I don’t have any concrete examples on hand but I have often read similar warnings in the docs and really felt myself really wanting to know the “why.” Is keeping this kind of thing out of the docs intentional (perhaps to keep them learn) or would PRs generally be accepted to clarify these things?

josevalim · December 4, 2024, 11:05pm

Of course, we can always improve the docs and PRs are welcome. If this is happening quite a bit, then we are not being told about those cases, because they would certainly have been addressed otherwise.

sodapopcan · December 4, 2024, 11:18pm

Awesome, thanks! I just honestly wasn’t sure if it was intentional or not as documentation bloat can be a very real problem and many people are happy to read “don’t do this” and happy to take it at face value. I will try and be more diligent opening PRs/issues.

japhib · December 4, 2024, 11:41pm

Thanks for the clarification! Maybe I can make an elixir PR to capture this info in the Logger.flush/0 doc.

martosaur · December 4, 2024, 11:49pm

Looking at the code, the function will only do something if the handler exposes a filesync/1 callback, which isn’t part of any behaviour to my knowledge. And if it does, the performance implication will depend on the exact handler, which in the majority of cases will be LoggerBackends, but can be something else as well Perhaps the right place to put the bulk of documentation is LoggerBackends docs?

josevalim · December 7, 2024, 8:53am

Sorry, I missed this. I have already updated the docs.