It is like pyramid, you measure what you can (metrics), you message where you need some debugging (logs), and trace on the high level. So you end with a lot of metrics, some logs, and few traces. Often you even implement sampling on traces to reduce amount of them, because as you noticed, some traces can be huge.
Each of these is also useful in different situations:
- metrics are used for “early warning”, we want to know what is going on and be able to react before there is a problem
- logs are used to find out where is the problem, check out for bugs and sometimes for other finding malicious parties (
- traces are used for profiling applications, finding bottlenecks, and monitoring how services interact with each other
So you see why it is often important to differentiate between them.
- Metrics tell you when is something happening in your application
- Logs tell you what is happening in your application
- Traces tell you why and how is something happening in your application
Sometimes it help, sometimes it doesn’t.
I would say that it depends on the amount of data you want to gather. Often, with broad monitoring, it will come to you sooner than later, even before “reaching scale”.
I meant systems with many metrics and heavy traffic. If we are monitoring only for the HTTP requests then often it will be enough, however, as in article I linked, even if you batch them you are limited by size of each log. See that each one log entry will contain about 28 bytes of data that isn’t really needed there (timestamp) as we are more interested about rate of the events, not exact time when these events happen.
Yeah, for starting projects it may be useful and interesting solution, however if you grow at least a little then you may encounter some problems (AFAIK, please correct me if I am wrong)
- Logflare UI do not support comparing and looking for correlations in graphs of the metrics
- There is no way to do more complex analysis like counting derivations, computing trends, etc. in the Logflare queries
- There is no alerting mechanism built into Logflare, which is the reason for using metrics
- There is only one graph in the Logflare UI - bar graph, which shows only rate of the events which is useful, but sometimes you need other graphs (heat maps, gauges, etc.). I do not really see how you would check CPU or memory usage using such UI
So as I said, it is useful, but you will very quickly grow out of it and you will need a “real” metrics gathering setup.