Effect of Logger :max_buffer

Background

I have an application that has to write down really long logs. Recently we have discovered that if a log is over 2K characters, the log is split into 2 log lines instead of one:

Current behavior:

2020-06-23T14:18:06.331027+00:00 machine00[123085]: 14:18:06.328 [info] Received body: {"token":"This is a really l
2020-06-23T14:18:06.333929+00:00 machine00[123085]: ong token as you can see","operator_id":20}

Expected behavior:

2020-06-23T14:18:06.331027+00:00 machine00[123085]: 14:18:06.328 [info] Received body: {"token":"This is a really long token as you can see","operator_id":20}

This is problematic because our support teams can’t make sense of the logs.

Logger

So I did a few tests with the Logger, namely if I ask Logger.info to print information longer that 2K characters, everything works as expected.

A possible cause for this could be the machine is so busy printing logs to the console that it has to separate them. To this extent I am trying to understand if this is related to some Logger buffer I am not aware of, max_buffer comes to mind:

https://hexdocs.pm/logger/Logger.html#module-console-backend

Questions

  • Imagining a machine is really busy printing logs (IO device is busy), what exactly happens to the logs that cannot be buffered? Are they discarded?
  • What other possible causes could be causing this? (Let you imagination run free, I personally blame Alien Space Lizards who are here to control us and make 2020 last forever!)
1 Like

You might want to add how you’re handling/storing logs. Where did you observe the line being broken up?

1 Like

Yes, but thats affected by even another setting:

:discard_threshold - if the Logger manager has more than :discard_threshold messages in its queue, Logger will change to discard mode and messages will be discarded directly in the clients. Logger will return to sync mode once the number of messages in the queue is reduced to one below the discard_threshold . Defaults to 500 messages.

Given the fact that you have the time stamp twice in your log makes me assume, that some other service deals with the log and maybe wraps them.

I have never seen wrapped logs from Logger so far, only truncated, and then Logger clearly communicates that with a trailing <truncated>.

As today systemd with journald is kind of the default, please check your /etc/systemd/journald.conf (or equivalent config files) for LineMax:

The maximum line length to permit when converting stream logs into record logs. When a systemd unit’s standard output/error are connected to the journal via a stream socket, the data read is split into individual log records at newline (" \n ", ASCII 10) and NUL characters. If no such delimiter is read for the specified number of bytes a hard log record boundary is artificially inserted, breaking up overly long lines into multiple log records. Selecting overly large values increases the possible memory usage of the Journal daemon for each stream client, as in the worst case the journal daemon needs to buffer the specified number of bytes in memory before it can flush a new log record to disk. Also note that permitting overly large line maximum line lengths affects compatibility with traditional log protocols as log records might not fit anymore into a single AF_UNIX or AF_INET datagram. Takes a size in bytes. If the value is suffixed with K, M, G or T, the specified size is parsed as Kilobytes, Megabytes, Gigabytes, or Terabytes (with the base 1024), respectively. Defaults to 48K, which is relatively large but still small enough so that log records likely fit into network datagrams along with extra room for metadata. Note that values below 79 are not accepted and will be bumped to 79.

2 Likes

The lines get broken when they reach 2K characters. The way the application does logging is that it prints everything to the console.

Then we have a daemon that pick stuff from the console and sends it to whoever is listening, in this case, rsyslog, which writes the logs into files.

The case for Splunk is different, so I am not sure how the app does it. I sure don’t see any Splunk backends configures in the Elixir’s configuration file, so I assume they feed from the same daemon, but I may be incorrect.

This behavior (splitting logs at 2K lines) is consistent with both Spunk and the Log files.

Yes, but thats affected by even another setting:

So my conclusion is, because no logs are being discarded, that this is not the cause of the problem.

As today systemd with journald is kind of the default, please check your /etc/systemd/journald.conf (or equivalent config files) for LineMax :

It sure is worth taking a look. Thanks for the tip!

TL;DR

This is here in case someone else in the future has this issue. We’ve finally figured it out
The issue is in the systemd and journald used in our CentOS version.
Elixir and Logger work as expected.

What is not the issue?

First of all, I would like to discuss what is NOT the issue
So the issue is not related to Elixir nor to Logger. To be clear, Logger will print single lined logs in a single line. I have done tests with lines up to 34k characters and everything was in the same line.

Where is the issue?

The issue here, is syslog, namely our version of syslog on CentOS 7, which is the distro we are using.

You see, systemd-journald is built with a hardcoded line length limit of 2048 characters per line, anything larger than that will be split into multiple log lines. There is a fix for this on CentOS 8 that works as expected.

Special thanks to @NobbZ for pointing in the direction that eventually led us to the discovery!

2 Likes

You can also log directly to journal which will not impose such limit.

2 Likes