Logger goes silent - crashing on our server (AWS ECS, writing to CloudWatch) without any error messages

This cropped up starting a couple of weeks ago. After some non-deterministic period of time, Logger has been crashing on our server (AWS ECS, writing to CloudWatch).

I actually am not sure that it’s crashing, because I’m not getting any error message, but it is at least not logging anything. We do have a couple of scattered IO.inspect(s) that are still writing occasional data.

Any ideas on how I might investigate this? I’m a little stumped.

  • No errors in the logs (because, no Logger?)
  • No pattern in the logs preceding the silence
  • No patterns in time-of-day
  • No patterns in time-until-silence (10d, ~2d, ~1d)
  • Instance is running at moderate load (~45% memory, 5-60% CPU)

Any ideas would be greatly appreciated.

Can you share:

  • Elixir version
  • Logger configuration
  • Whether you use any external log aggregator
  • How you run deployed application

Elixir: 1.10.4
Erlang: 23.0.4
(from bitwalker/alpine-elixir-phoenix:latest)

config :logger, :console,
  format: "$time $metadata[$level] $message\n",
  metadata: [:request_id, :user_id, :method, :path]

The application is deployed on AWS ECS (Elastic Container Service) as a Docker image (built/run with bitwalker/alpine-elixir-phoenix). The console output is piped to AWS CloudWatch.

Prior to January 19th, the application had been running smoothly for about 2 years, with restarts for releases only. I’ve been looking through the commits that went out with the release on the 19th (which went out the 7th), but nothing seems related to Logger.

Any of that info help?

Please read the Docker documentation for logging as it has some alerts and guidance that may or not help you, but need to be taken in account, specially if your application logs a lot.

By default, no log-rotation is performed. As a result, log-files stored by the default json-file logging driver logging driver can cause a significant amount of disk space to be used for containers that generate much output, which can lead to disk space exhaustion.

Docker keeps the json-file logging driver (without log-rotation) as a default to remain backward compatibility with older versions of Docker, and for situations where Docker is used as runtime for Kubernetes.

For other situations, the “local” logging driver is recommended as it performs log-rotation by default, and uses a more efficient file format. Refer to the Configure the default logging driver section below to learn how to configure the “local” logging driver as a default, and the local file logging driver page for more details about the “local” logging driver.

Warning

When the buffer is full and a new message is enqueued, the oldest message in memory is dropped. Dropping messages is often preferred to blocking the log-writing process of an application.

I appreciate the links.

Granted, it’s AWS, and I’m never quite sure that I’m reading their documentation correctly but it looks like the awslogs driver (which we use) doesn’t write to disk.

If your tasks are using the awslogs log driver, then the following conditions are true:

  • Logs are streamed to Amazon CloudWatch Logs. These logs are never written to the container instance.

from their troubleshooting logging page

1 Like

Hmm, this seems very odd, especially since IO.inspect is still getting successfully “logged”. Have you tried enabling sasl logs? Maybe that will give you a clue? Gaining Insight into an Elixir Application with SASL Although since there’s a problem with the logs, maybe it won’t.

1 Like

I would try to see if your app is being OOM killed, this is the most common reason why we’ve seen sudden app crashes without real logs. Usually you’d seen indicators in system logs or K8s pod data (if that’s what you’re using)

1 Like

Although since there’s a problem with the logs, maybe it won’t.

Right @axelson ? How do you log errors in Logger itself?

But I’ll enable SASL and see if that generates anything. The current release has been running for about 3 days since I last restarted it and is still logging just fine.

Side note: while writing my first forum post, I got annoyed that there wasn’t a preview button to look at the formatted markdown. I just noticed the auto-preview on the right. :joy:

@benwilson512 , the application itself continues to run just fine – responding to requests per normal. Just no Logger logs (although IO.inspect continues to work).

Just to follow up on this: since I originally posted, the app has been up and logging for 31 days (with no significant changes).

Solved: Posted about the issue, and it stopped happening.

2 Likes

Yhprum’s law at work

Could this potentially be related?