Solid log aggregation setup?

pdilyard · January 31, 2017, 6:53pm

I’m looking for a solution that will aggregate logs from all app servers reliably. Features that would be nice to have:

Notifications for errors
Phoenix integration (some insight into the request/response attached to any errors)
Performance monitoring (automatic integration with Phoenix controllers would be great, but as long as you could add some code to measure a function, it would be fine)

I realize the performance monitoring part might have a different solution than the rest.

What I’ve tried so far:

exometer with statsd and DataDog: I liked statsd and DataDog, but exometer was very cumbersome to maintain (no great hex package, dependency collisions, etc.)
Honeybadger: everything about Honeybadger was fine, except for the fact that the Elixir lib brought down my entire app eventually, even if it had been stable for weeks
Appsignal: Seems nice, but the Elixir lib is a little too unstable at this point, and I would prefer something that I can install on in-house servers for both logging and performance statistics

I’d love to hear what people with existing production apps are doing!

whatyouhide · January 31, 2017, 6:58pm

We (Football Addicts) are using

Fluxter to report metrics from Elixir apps to Telegraf, which aggregates and sends to InfluxDB for storing metrics (see this blog post on our tech blog)
Rollbax to report exceptions to Rollbar
Papertrail to store logs (we’re not too happy with Papertrail because of costs for our volume of logs, but it works for now)

pdilyard · January 31, 2017, 10:51pm

Awesome, thanks for the reply. That looks very helpful.

johnkelly · February 1, 2017, 9:04pm

I wrote a blog post about our logging setup at Bleacher Report a few months ago. We use ELK for logging and use the SAAS Logz.io so we don’t have to run the ELK infrastructure. All three of your points are covered by it.

pdilyard · February 1, 2017, 9:16pm

Hey John, coincidentally I just stumbled across your blog post less than 5 minutes before your reply

I think the ELK set up is what we’re going to go with. Thanks!

pdilyard · February 1, 2017, 9:28pm

I do have one quick question about plug_logger_json, actually.

Does this only work for Plug-based projects? We have other Elixir apps in our stack (that run background jobs, for example), that don’t make or handle any web requests. Would these apps also need to output JSON logs?

johnkelly · February 1, 2017, 9:47pm

@pdilyard, plug_logger_json is only for plug projects but I ship other logs to ELK in the same way (json). I usually create a logger module in the project and define some log functions for some common logging use cases. It’s useful for everything to be JSON so that I can view all the logs with a shared field like request_id.

Here’s an example for logging outside of plug to ELK where I log external requests so I can trace a slow request and see if an external call is slowing things down. You’d probably want to do something similar for background logging. Just pick a different log_type other than http which is what plug_logger uses.

  def http(func, url, start) do
    _ = Logger.log :info, fn ->
      stop = :os.timestamp()
      %{
        "duration"        => Float.round(:timer.now_diff(stop, start) / 1000, 3),
        "function"        => func,
        "level"           => "info",
        "log_type"        => "external_request",
        "request_id"      => Logger.metadata[:request_id],
        "url"             => url
      }
      |> :jiffy.encode
    end
  end

mgwidmann · February 1, 2017, 9:57pm

Shameless plug here, I’ve started a self hosted version of Rollbax that is still early on but works minus a few small issues.

Gazler · February 2, 2017, 5:13pm

Thanks for linking these, they’re really useful.

Is there any reason that the status is explicitly converted to a string on https://github.com/bleacherreport/plug_logger_json/blob/master/lib/plug/logger_json.ex#L109 ?

Wouldn’t it be better to leave it as an integer so that ranges (such as status:[200-299]) work inside of Kibana?

johnkelly · February 2, 2017, 8:50pm

Good point! I’m not sure why I did that explicitly. Seems like an oversight on my part. I’ll experiment with switching that to an int and make that change for the next minor version unless I discover a reason for it being a string.