pdgonzalez872

pdgonzalez872

How to make sense of logs? (Log Analysis?)

Hi!

In my quest of becoming the best Elixir dev I can be, I saw one aspect in my
career that I’d like to improve upon. This is language agnostic, but since
Elixir has an awesome community, I’m sure you’d be able to help, specially
since you may have experience with distributed systems and logging in those can
only be crazier.

So, have you read/studied material on how to log/review logs?

This is somewhat related to debugging and you have parallels with other
professions as well. This is not only for programmers. Forensic accounting
comes to mind when dealing with multiple information sources and trying to
piece things together.

I believe it’s an art, to review logs, build a timeline, know how things work
from the logs themselves. I think folks learn over time, but it is such a
powerful weapon that I see the best ones have. I have not found good material
on it and was wondering if you could help me with this.

This goes hand in hand with debugging in a way. I’ve seen josevalim on Twitch
and picked up a ton of little things that people just don’t really talk about,
not by malice by any means, just seems like these “things” have to be learned
over time. Well, they add up and knowing them + using them as tools is life
changing. I guess that’s what you call experience…

I do think there is a systematic approach that many of the best share. Maybe
folks don’t even know about, or they do and I don’t know about it so I thought
I should ask.

Thanks for the help in advance!

Paulo

PS:

Here are a couple of resources that I think are related to this. Sorry if I’m not too clear, maybe your answers can help me focus my questions better.

Reverse engineering example:

I found a debugging book, unsure if would help with this, but it may:

Most Liked

lucaong

lucaong

Hi @pdgonzalez872,
the answer I have in mind is very general, and has to do with debugging more than with specifically analyzing logs, but I hope it can still be useful.

In my experience, debugging is an activity that is best approached with the scientific method.

Imagine you have a weird bug. Something that is not immediately obvious, but rather puzzling and defying explanation. Say that application instances crash, for no apparent specific reason. Assume that you collected the initial evidence, but you are still clueless about the root causes. Here is where the scientific method comes into play:

  • First of all, before you even start digging through the logs, formulate hypotheses. A good hypothesis is one that produces testable predictions. For example, if I suspect that the crash is related to a hardware problem, I should predict that all crashing instances are located on the same physical machine. If you instead suspect a memory leak, you might predict that we should find “out of memory” log entries. These are things we can practically validate.

  • Even more important, hypotheses are falsifiable: if we observe instances crashing on different hardware nodes, my hypothesis about a hardware failure becomes extremely unlikely, so we can archive it and move on to the next one. I should not get attached to my hypothesis: if it turns out to be false, we still made a step forward in understanding.

  • Only after knowing what you are looking for, you look at your logs, metrics, etc. It’s very important to stick to testing hypotheses, instead of spending too much time randomly looking for patterns. Even though serendipity can sometimes help in simple cases, our brain is too prone to see some pattern even when there is really none. If we really suspect that a pattern is more than a coincidence, we need to find a way to test that.

  • Typically, this goes in cycles: a hypothesis suggests a test, which brings some answers, which in turn bring more questions, more hypotheses, more tests, etc. If the test result is inconclusive, we need to go back to the drawing board, with either new hypotheses or new ways to test. This testing cycle drives your investigation.

  • The best engineers I had the pleasure to work with are egoless: they won’t focus on proving themselves right, or on showing off their knowledge. They can use this method so efficiently, that it might seem they can always guess right. The truth is often the opposite: they are good precisely because they don’t guess, but rather follow a method.

Does this make sense?

lucaong

lucaong

Here is my effort to share in short what I learned by experience about logging. The most important thing, in my opinion, is that your log messages should be:

  1. Informative: while concise, they should contain all the important information about the event, not just say “event X happened”. It helps to think: should I even look for this event in the logs, what information would I probably want to know? A practical example: a log event about a purchase might include the user ID that made the purchase.

  2. Discoverable: it is useless to log a lot of information, if it is not possible to easily find it when you most need it. This is not only a problem of having the right tools, but has actually to do with logging in such a way to enable future retrieval. Should you need to search for logs related to a certain occurrence, what search terms are you likely to use? Also, when debugging, you often start looking for some event in the log, but soon want to correlate it with other events happening around it. A practical example: in a web app, including a unique “request ID” in all logs pertaining to the same HTTP request makes it possible to easily correlate all log events that happened within the same request. Including a user ID, or a hash of it, makes it possible to find everything logged regarding a specific user (just pay attention not to log sensitive user data though). Tagging logs by service is also a common strategy. Multi-service architectures often pass around a “correlation ID” for the same request, that all services include in log messages, in order to follow the request flow across services.

  3. Reviewed and maintained: when debugging an issue, take note of what would have helped you, and what could be improved. Every application is different in terms of what is interesting to log and what is not. Keep improving it, add missing information or events, remove noise, and debugging will get better and better for you and your team.

What do you think? I hope this helps

pdgonzalez872

pdgonzalez872

Ciao @lucaong, that’s great advice. It talks about how to log and maintain logging sane, this is critical to this task.

Now, help me think: imagine you have great logging already since I followed your instructions. Now what? There is something to be said about what to do with the logs after you have them. I feel that it is almost like looking at a painting: some people know what they are looking at and others see some paint on a canvas. (sorry for the analogy :slight_smile:, best I could do)

As usual, it depends on many factors. But, in general, I do think there is a process, some type of step by step that we do, maybe even without knowing we do it. Maybe they are as follows:

  • Do a first read in chronological order to see if anything jumps out as you are looking for clues (cc: @fhunleth)
  • Anything weird? Warnings, errors?
  • Use clues in logs to build a timeline of events
  • If not enough logging, follow @lucaong’s advice and add. Deploy and monitor.
  • (Imagine this is a bug for a second) Find error in log. Look at codebase for said log message. Go up the call stack and find root cause.

Something like that? Anyways, I don’t want to make this into an esoteric discussion, I just wanted some guidelines on how to do it properly, I want to understand that painting! :slight_smile:

Thank you for the reply, really good feedback, exactly the type of discussion I was looking for.

Where Next?

Popular in Chat/Questions Top

woohaaha
I’m coming from Ruby and Rails. I have read some Elixir and Phoenix books. They shed a lot of light about building applications in Elixir...
New
Iex.new
Hello!, I just started this week to discover Elixir. I’m a PHP-Programmer and did some sutff in Go too. The more I read about Elixir t...
New
New
aswinmohanme
I recently finished the Udemy course on Elixir and Phoenix and I am thinking about using it for the next project. But I am stuck as how t...
New
William
I would like to know that is there any online source for learning Phoenix Framework for building E-Commerce Store? Any advantage on build...
New
xgilarb
Hi there, I’m interested in using Elixir because of the rumors about the reliability of the Phoenix framework, and surprisingly, Elixir’...
New
New
zeroexcuses
Besides https://elixir-lang.org/getting-started/basic-types.html are there any other well recommended “elixir by example” style resources...
New
InkFlo
Hi everyone, This year I’m graduated from Bachelor Degree (in computer science) from France (not really a bachelor, the exact term is “L...
New
jslearner
Will learning Erlang really help in being a better Phoenix or Elixir developer or is it a waste of time?
New

Other popular topics Top

lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New
skosch
To my knowledge, put_in, Map.update etc. all have the one limitation of not automatically creating intermediate keys when needed (for exa...
New
msaraiva
Surface is an experimental library built on top of Phoenix LiveView and its new LiveComponent API that aims to provide a more declarative...
564 43622 214
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
chrismccord
This release brings a number of exciting features, including integration with the new Phoenix LiveDashboard and Phoenix LiveView. There h...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
Qqwy
Original source of discussion: This topic on the Pragmatic Programmers’ Functional Web Development with Elixir, OTP, and Phoenix forum. ...
New
boundedvariable
I am going through the kafka architecture. All the features what the kafka is providing are already in Erlang. I would like hear your opi...
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New

We're in Beta

About us Mission Statement