Future of Logger in Elixir

With OTP 21 and the creation of logger Elixir Logger application became quite limiting (no structured logging). I would like to discuss with how do you see the future of logging in Elixir as I have rewritten most of the logging in my application to “new” interface as this provides more flexibility. With rise of logger I also see lesser need for projects like telemetry which can be easily replaced by mentioned earlier structured logging and metadata.

What do you think?

4 Likes

Interested in what you put together if you can share? I’m currently working on Logflare and we actually need beta testers for our new logger backend.

I really like being able to keep the raw event data as you can drill into it vs the time series approach. As long as your db is fast enough to query, you’re able to insert in real-time and you’re happy with the visualization / BI tools available … I think the time series approach is less favorable.

With our Logger backend you can structure logs and add any data you want. Logflare uses BigQuery as the db and manages the BigQuery schema so you can easily query each field specifically. And BigQuery plus Data Studio is very nice although I wish it would give me minutely graphs out of the box.

3 Likes

I do not quite get what is the problem here. Erlang’s logger allows to use structured logging, so instead of (in Elixir syntax) doing Logger.info("Responded HTTP 200 in #{time}") you can do: Logger.info(%{http_response: 200, http_response_time: time}) and at the same time provide more information in metadata. This allows to eat cookie and have cookie because:

  • you can programmatically get meaningful data without processing text
  • you still can insert metadata (that aren’t part of the log message) into metadata

Grafana describes difference quite nicely, so you can see that in most (all?) cases metrics can be extracted from logs. Of course some of these logs will need to be discarded to not needlessly increase logs volume, but that can be done quite easily later.

3 Likes

Structured logging is great and I hope it sees wide adoption.

There are at least a couple situations that would motivate me to use something like telemetry as well.

  1. High-level, aggregate metrics often help people intuit what’s going on in a way fine-grain, detailed log messages don’t.
  2. Logging incurs a higher latency and storage penalty than metrics. Consider what happens when any part of the structured log is a large data structure that must be serialized and perhaps truncated.

Once I helped investigate why a production system was suffering occasional latency spikes. The problem was one log message sometimes emitted lines that were tens of megabytes long.

That’s a good point. For production services I can access I want the ability to aggregate and view real-time metrics without relying on logging overhead, and additionally logs for when I need to drill deeper. For services and other programs which I can’t access, but can perhaps acquire logs through support channels I want the ability to feed the log into the metrics pipeline and get fancy charts.

1 Like

Yes I think we’re in agreement here :slight_smile:

I’ve found Elastic based solutions to be prohibitively expensive at any sort of decent volume. With something like BigQuery as your db you can keep orders of magnitude more event data and still query quickly. Logflare and our logger backend lets you get that structured data into BigQuery easily.

I agree. I am part of the EEF Observability WG and my question isn’t whether we should rely only on the logger but whether we should include metrics in logs and then provide an logger handler that will extract these from logs and aggregate them. So everything would happen without sending high volume messages to external log handler.

So in the end there still would be some kind of aggregate that would reduce volume, so even when there happen error that is gigabyte long it shouldn’t be a problem, as most of the data would be discarded by handler anyway.

So my question is mostly about libraries like Ecto which use telemetry right now. Would it be better to keep telemetry or maybe use structured logs that could provide the same informations.

2 Likes

Congratulations! I’m happy about EEF’s formation and that there’s working groups. :slight_smile:

For context I’ve only recently started using BEAM-based languages again, and still need to get up to speed with OTP’s logger, telemetry, and Ecto.

That sounds like a useful option, especially for situations when doing remote software support based on provided logs. Do you anticipate the use case requiring any changes to logger’s front-end API? Would there be some relationship between the logger backends you’re proposing and telemetry metrics?

My general stance is it’s nice to give developers options which cover their common use cases. Is there a sensible common interface people could opt-into, and then configure whether telemetry, logging, or both would aggregate metrics?

If they’re just too different for a common abstraction layer, I’m inclined–from my admittedly naive standpoint–to want common library projects such as Ecto using both telemetry and structured logging, while giving application developers the ability to configure each according to their needs.

Maybe others more experienced with these tools than me can offer more specific feedback?

No. I do not see any reason for such changes.

Current Telemetry.Metrics would need to be implemented by the backend that would use logs to aggregate metrics. So in theory we could have shared API for defining aggregations, but I do not think that it would be needed in that extent.

I am also working on OpenCensus (to be renamed to OpenTelemetry) integrations, so this is how I see the API to be done.

The API is almost identical as Telemetry has execute/2-3 function and in case of metrics it would be one of the macros (probably on notice level as info is filtered out by default) and if you look closely then you will see that the arguments (in case of structured logs) are exactly the same in both cases. The only difference is how you digest the data, but TBH this is matter that interest only integration authors, not users.

1 Like

While I am not at all an expert in any of this, structured logging has been something I wanted ever since back in my Java days (2002 and on). I see no reason why we must always dump a ton of metadata in a hard-to-parse format like the text; formats change, new team leads or CTOs come, policies change and in the end I’ve had to maintain a system that parsed 11 different log formats for 4 years.

We can do better.

OpenCensus => OpenTelemetry sounds very good as well.

1 Like

Sounds like you are a fan of systemd instead of sysinitv then? ^.^

1 Like

When the crap-show there started (good old “us vs. them” style) I completely stopped paying attention. I have to say nowadays I remain blissfully ignorant on the topic. :003:

I will be grateful to be enlightened though.

1 Like

In short, since not really a topic for it, systemd prefers all main-system-related parts of a system to know about each other and talk via API’s, unlike the sysinitV old model that uses text to talk. This even shows in their logging systems where sysinitV is just text files, and journald (systemd’s logging system) is a structured logging system, quite literally a time based database, which by default on most (well, all that I’ve seen) distribution will write out a copy to the sysinitV’s text file locations in textual format for backwards compat, but that is entirely optional and not something you need unless you have things that read old sysinitV’s logging data. Consequently doing a query/search/lookup/whatever over the logs in the system is significantly faster than via plain text files and you can get far more accurate results.

The thing that causes the systemd vs sysinitV in the linux community is that people “Don’t like monolithic tools” as it goes against the unix model. The thing is that by the tools being so tightly bound together means that they can become significantly more efficient, which is why systemd is just outright faster than sysinitV models (like it is inherently parallel in hardware init, even communication pipes between programs are init’d “immediately” even before the dependencies are fully loaded yet then paused until loaded when access is performed, etc… etc… etc…, which are a lot of things the sysinitV model just cannot do). Plus the fact it’s not Text based but rather is binary (and I say those people don’t know how UNIX systems traditionally worked…).

3 Likes

In that case you are spot on. I do like systemd more!

This! The UNIX philosophy is being exaggerated in many places and that’s not OK. We are hitting hardware limits that won’t be bumped for years and we have to start getting very conservative and practical about the cost of decoupling stuff religiously. Sure it helps conceptually but having everything in a mono-repo for your project would be even better – caching, monitoring, logging, hell, even your DB and storage should be integrated right into your project! And even your OS. (Hello Nerves and Erlang hardware, I love you!)

I don’t deny that Postgres, Redis and many others are top-notch marvels of engineering we often can’t do without. But IMO it’s time we start some sort of a tech convergence or else 2-3 Electron apps can still clog a perfectly cooled i9 CPU.

1 Like

I think the current logger “interface” is good enough - which can take any data as metadata. telemtry is different thing.

What’s missing for me is good logging formatter and configuration convention across different applications, so that you don’t hit strange issues (… like a library passes custom struct to metadata, and Jason refuses to encode it to JSON)

To reuse my convention… I wrote following small packages to print out metadata in one-line JSON

In my case, I’m using just console backend, as apps are running in k8s and log aggregation is done by integrated ELK. However, some may prefer to have logger backend directly integrated with other log collection systems.

A while ago @pragdave created https://hex.pm/packages/bunyan for such framework. Not sure about other options.

Unfortunately it is lagging behind Erlang’s logger API which provides structured logging.

Not that different. telemetry is just additional logging library that is meant to log only metrics, but these two are interconnected. So for me there is not that much difference between logger’s structured logging feature and telemetry.

And metadata field in log entry is completely different from actual log entry. For example request duration belongs to log main value, not its metadata. And in structured logging you can make explicit distinction which is what.

Structured logging still can be achieved with textual format, these two features are orthogonal. Just using some kind of TSDB helps with querying such data.

I see Elixir Logger expects chardata while Erlang logger can accept a map src.

This can be improved in future version of Elixir requiring OTP > 21, without breaking interface, instead of taking chardata_or_fun type. This is actually in Support Erlang 21 new features · Issue #6611 · elixir-lang/elixir · GitHub - “Unify :logger and Logger metadata”

I see your point, but why not using metadata as your “additional data” attached to keep the data? And actually there are good feature of Logger metadata - living in a process - so you can just set persistent “main log value” in metadata, and keep using it. Do we need to reinvent this behavior? Maybe. But why don’t we leverage existing metadata?

Logger.metadata(transaction_id: txid)
Logger.debug("starting", foo: 1, bar: 2)
# ....
Logger.info("completed")

If we want side-effect free logging… then we need to pass everything. That works, but I don’t think it worths to do that. We set Logger metadata explicilty for logger (e.g. not “misusing” process dictionary… oh I’m breaking the rule #1)

log_common_metadata = [transaction_id: txid]
Logger.debug([message: "starting", foo: 1, bar: 2] ++ log_common_metadata)
Logger.info([{:message, "completed"} | log_common_metadata)

Logging and collecting metric has some common goals, but logger and telemtry is different mechanism. logger creates log events on caller side while telemtry implements dispatching mechanism to allow others know when a specific event happens. The can serve the same purpose - by using telemtry to trigger events (via :telemetry.execute/3) for any events requiring logs, and writing telemtry handler to subscribe those events, and just “print log message”. It works, and I see there are certain cases it’s better to use define telemtry events. Oh, we may define a event called “logevent” and then log backend subscribing those events. Oh, then we basically use telemtry as message queue :slight_smile:

This is not exactly that. The unification there is meant that when you call :logger.set_process_metadata/1 it will also affect Logger.metadata/0 return value, so utilities like OpenCensus do not need separate wrapper to handle both logging libraries. It isn’t at all about structured logging as this would be a breaking change for formatter and/or backends.

This is the whole point. The metadata should be only “additional data” that give you context about log entry, it should not contain log data like request duration, as this belongs to the log entry itself.

Example of difference between structured logging and “classical” approach:

In Plug.Logger right now we have (on abstract level):

duration = measure_duration(call_next_plugs())

Logger.info("Respond #{conn.resp_code} on #{conn.path} in #{duration}", request_id: req_id)

While structured log will be like:

duration = measure_duration(call_next_plugs())

Logger.info(%{resp_code: conn.resp_code, req_path: conn.path, req_duration: duration}, request_id: req_id)

So you see that log entry contain “core” values of the log and metadata is for, well, metadata.

Not much to be honest. Both of these are just dispatch of some data to different backends. The differences are even less distinct in case of Erlang’s logger and telemetry. The only difference in fact is that logger also pass the data through formatter.

All telemetry, Logger, and logger are basically message queues.

The main problem is that with logger replacing telemetry and broader usage of structured logging it will be easier to implement white-box telemetry in BEAM core rather than adding new library to default release. Especially as logger is in kernel application so it is always up and running in all Erlang applications running OTP 21+ as this is application that is (almost?) impossible to not run.

I think we’re on the same page that we need more “structured logging”. Don’t get me wrong - I’ve achieve that using metadata. see my code at https://github.com/chulkilee/plug_metadata_logger/blob/v0.1.0/lib/plug_metadata_logger.ex

From developer point of view, it doesn’t matter how logger and telemetry are implemented in similar way or not. I just need structured logging, and I could do that with metadata and custom formatter.

What I don’t understand from your comments are: what problem you’re trying to solve?

  • Logging is for application itself, and log message is not generally expected to be consumed by other erlang applications.
  • Telemetry is for (new) common interface for instrumentation, which allows others to instrument you - so events are expected to be consumed by others.

They’re for different purpose, so “interface to user” should be different. Even they are implemented with shared app, it doesn’t make sense to “consolidate” Logging interface into telemetry, or vice versa.

2 Likes

That’s what json is after all, just structured text. I was referencing their ecosystem designs though. ^.^

Ooo really? I really need to read up on erlang’s new logger already… >.>