Collecting stats from 150GB logs - in just 600 seconds!

mix
logs

#1

I need to know the request timings of one micro service to another micro service from 15 log files of 10GB each. so i wrote https://github.com/murtza/log-stats code. it provide results in less then 950 seconds from 150GB of logs. Erlang/Elixir is really amazing.

System specs are as follows.

CPU: AMD Ryzen 2700x (8 core/16 threads)
RAM: 32GB DDR4 3600MHz
SSD: 512GB Samsung NVME 970 Pro

Erlang: 21.2
Elixir: 1.8.1
OS: MacOS Mojave 10.14.3 (Hackintosh)

result map is as follows

%{
  "order/update" => 1011450,
  "payment/fetch" => 1810,
  "charges/refund" => 1720,
  :ccb_resp_to_console_blazing_fast => 30550240,
  :ccb_resp_to_console_200 => 10,
  :ccb_resp_to_console_after_6s => 18240,
  "payment/search" => 110,
  :ccb_resp_to_console_after_2s => 206250,
  "customer/update" => 47309,
  "bill_rate/fetch" => 11540874,
  :ccb_resp_to_console_after_30s => 7003,
  :ccb_resp_to_console_after_50ms => 1412293,
  "order/create" => 195918,
  "billing_credit/fetch" => 21,
  "payment/reverse" => 8,
  "bill_rate/search" => 127725,
  :ccb_resp_to_console_after_20s => 818,
  "bill/fetch" => 586009,
  :ccb_resp_to_console_after_60s => 13092,
  "party/update" => 4616,
  "one_off_charge/create" => 178059,
  :ccb_resp_to_console_after_50s => 232,
  :ccb_resp_to_console_after_10s => 37016,
  "customer/create" => 105609,
  "order/fetch" => 12874424,
  "party/search" => 910,
  "party/fetch" => 1007454,
  "bill/search" => 11120906,
  :ccb_resp_to_console_after_9s => 915,
  :ccb_resp_to_console_after_8s => 11057,
  "party/create" => 159308,
  :ccb_resp_to_console_after_3s => 16054,
  :ccb_resp_to_console_after_15s => 14128,
  :ccb_resp_to_console_after_40s => 42,
  :ccb_resp_to_console_after_250ms => 101607,
  "payment/create" => 7695,
  :ccb_resp_to_console_after_4s => 40285,
  :ccb_resp_to_console_not_200 => 668928,
  :ccb_resp_to_console_after_7s => 1919,
  "usage_by_time_band/fetch" => 51,
  :ccb_resp_to_console_after_25ms => 10522315,
  :ccb_resp_to_console_after_1s => 59080,
  :ccb_resp_to_console_after_5s => 25474,
  :ccb_resp_to_console_after_500ms => 21382,
  "customer/fetch" => 6092074,
  :ccb_resp_to_console_after_100ms => 57093
}
948

Elixir Binary - what is the equivalent to C#'s Take and Skip?
#2

I’ve just glanced at the code so I could be missing something obvious, but I noticed :ets.insert(:stats, {endpoint, count + 1}) which looks like it would be an ideal use case for :ets.update_counter.

Did you try update_counter? If so, was there a reason you went with insert instead?


#3

yeah its a good use case for :ets.update_counter and i should use that. thanks :slight_smile:


#4

:ets.update_counter/3 saved me 53 seconds more. now i get results in 895 seconds. Good catch


#5

I’m guessing the log files are business-sensitive and thus not public? I’d try my hand making this even faster if I could get them.


#6

you are right, i cant share them but here is the format

2019-03-03 UTC 21:16:20.441 [request_id=2m48urrhmudbhu3hk815uj61 ] [info] POST /api/v1/bill/search
2019-03-03 UTC 21:16:20.441 [request_id=2m48urrhmudbhu3hk815uj61 ] [info] e[33mREQUEST: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2000 chars
2019-03-03 UTC 21:16:20.452 [request_id=2m48urrhmudbhu3hk815uj61 ] [info] e[32mRESPONSE: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 3800 chars
2019-03-03 UTC 21:16:20.452 [request_id=2m48urrhmudbhu3hk815uj61 ] [info] Sent 200 in 11ms
2019-03-03 UTC 21:16:20.455 [request_id=2m48urriike6avbbcc15uj81 ] [info] POST /api/v1/customer/fetch
2019-03-03 UTC 21:16:20.455 [request_id=2m48urriike6avbbcc15uj81 ] [info] e[33mREQUEST: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2600 chars
2019-03-03 UTC 21:16:20.485 [request_id=2m48urriike6avbbcc15uj81 ] [info] e[32mRESPONSE: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 5500 chars
2019-03-03 UTC 21:16:20.485 [request_id=2m48urriike6avbbcc15uj81 ] [info] Sent 200 in 29ms
2019-03-03 UTC 21:16:20.493 [request_id=2m48urrkrc7rfindjs15uja1 ] [info] POST /api/v1/order/fetch
2019-03-03 UTC 21:16:20.493 [request_id=2m48urrkrc7rfindjs15uja1 ] [info] e[33mREQUEST: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 450 chars
2019-03-03 UTC 21:16:20.512 [request_id=2m48urrkrc7rfindjs15uja1 ] [info] e[32mRESPONSE: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:20.512 [request_id=2m48urrkrc7rfindjs15uja1 ] [info] Sent 200 in 18ms
2019-03-03 UTC 21:16:20.516 [request_id=2m48urrm7df28vhnt015ujc1 ] [info] POST /api/v1/bill_rate/fetch
2019-03-03 UTC 21:16:20.517 [request_id=2m48urrm7df28vhnt015ujc1 ] [info] e[33mREQUEST: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 1500 chars
2019-03-03 UTC 21:16:20.533 [request_id=2m48urrm7df28vhnt015ujc1 ] [info] e[32mRESPONSE: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:20.533 [request_id=2m48urrm7df28vhnt015ujc1 ] [info] Sent 200 in 16ms
2019-03-03 UTC 21:16:21.021 [request_id=2m48ursk92fm3dv5t815uje1 ] [info] POST /api/v1/bill/search
2019-03-03 UTC 21:16:21.021 [request_id=2m48ursk92fm3dv5t815uje1 ] [info] e[33mREQUEST: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2000 chars
2019-03-03 UTC 21:16:21.028 [request_id=2m48ursk92fm3dv5t815uje1 ] [info] e[32mRESPONSE: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 3800 chars
2019-03-03 UTC 21:16:21.028 [request_id=2m48ursk92fm3dv5t815uje1 ] [info] Sent 200 in 7ms
2019-03-03 UTC 21:16:21.031 [request_id=2m48ursksnrv4ruli415ujg1 ] [info] POST /api/v1/customer/fetch
2019-03-03 UTC 21:16:21.031 [request_id=2m48ursksnrv4ruli415ujg1 ] [info] e[33mREQUEST: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2600 chars
2019-03-03 UTC 21:16:21.059 [request_id=2m48ursksnrv4ruli415ujg1 ] [info] e[32mRESPONSE: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 5500 chars
2019-03-03 UTC 21:16:21.059 [request_id=2m48ursksnrv4ruli415ujg1 ] [info] Sent 200 in 28ms
2019-03-03 UTC 21:16:21.067 [request_id=2m48ursn2go7mr4plo15uji1 ] [info] POST /api/v1/order/fetch
2019-03-03 UTC 21:16:21.068 [request_id=2m48ursn2go7mr4plo15uji1 ] [info] e[33mREQUEST: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 450 chars
2019-03-03 UTC 21:16:21.088 [request_id=2m48ursn2go7mr4plo15uji1 ] [info] e[32mRESPONSE: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.088 [request_id=2m48ursn2go7mr4plo15uji1 ] [info] Sent 200 in 20ms
2019-03-03 UTC 21:16:21.092 [request_id=2m48ursohe3o1j2h6015ujk1 ] [info] POST /api/v1/bill_rate/fetch
2019-03-03 UTC 21:16:21.092 [request_id=2m48ursohe3o1j2h6015ujk1 ] [info] e[33mREQUEST: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 1500 chars
2019-03-03 UTC 21:16:21.134 [request_id=2m48ursohe3o1j2h6015ujk1 ] [info] e[32mRESPONSE: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.134 [request_id=2m48ursohe3o1j2h6015ujk1 ] [info] Sent 200 in 42ms
2019-03-03 UTC 21:16:21.482 [request_id=2m48urtfpvtjofoo1g15ujm1 ] [info] POST /api/v1/bill/search
2019-03-03 UTC 21:16:21.483 [request_id=2m48urtfpvtjofoo1g15ujm1 ] [info] e[33mREQUEST: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2000 chars
2019-03-03 UTC 21:16:21.490 [request_id=2m48urtfpvtjofoo1g15ujm1 ] [info] e[32mRESPONSE: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 3800 chars
2019-03-03 UTC 21:16:21.490 [request_id=2m48urtfpvtjofoo1g15ujm1 ] [info] Sent 200 in 7ms
2019-03-03 UTC 21:16:21.608 [request_id=2m48urtna0uarc8gio15ujo1 ] [info] POST /api/v1/bill/search
2019-03-03 UTC 21:16:21.608 [request_id=2m48urtna0uarc8gio15ujo1 ] [info] e[33mREQUEST: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2000 chars
2019-03-03 UTC 21:16:21.615 [request_id=2m48urtna0uarc8gio15ujo1 ] [info] e[32mRESPONSE: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 3800 chars
2019-03-03 UTC 21:16:21.615 [request_id=2m48urtna0uarc8gio15ujo1 ] [info] Sent 200 in 6ms
2019-03-03 UTC 21:16:21.618 [request_id=2m48urtnrl354i888415ujq1 ] [info] POST /api/v1/customer/fetch
2019-03-03 UTC 21:16:21.618 [request_id=2m48urtnrl354i888415ujq1 ] [info] e[33mREQUEST: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2600 chars
2019-03-03 UTC 21:16:21.648 [request_id=2m48urtnrl354i888415ujq1 ] [info] e[32mRESPONSE: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 5500 chars
2019-03-03 UTC 21:16:21.648 [request_id=2m48urtnrl354i888415ujq1 ] [info] Sent 200 in 30ms
2019-03-03 UTC 21:16:21.656 [request_id=2m48urtq550v0mq9ro15ujs1 ] [info] POST /api/v1/order/fetch
2019-03-03 UTC 21:16:21.656 [request_id=2m48urtq550v0mq9ro15ujs1 ] [info] e[33mREQUEST: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 450 chars
2019-03-03 UTC 21:16:21.675 [request_id=2m48urtq550v0mq9ro15ujs1 ] [info] e[32mRESPONSE: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.675 [request_id=2m48urtq550v0mq9ro15ujs1 ] [info] Sent 200 in 19ms
2019-03-03 UTC 21:16:21.680 [request_id=2m48urtrikdbgfd4mg15uju1 ] [info] POST /api/v1/bill_rate/fetch
2019-03-03 UTC 21:16:21.680 [request_id=2m48urtrikdbgfd4mg15uju1 ] [info] e[33mREQUEST: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 1500 chars
2019-03-03 UTC 21:16:21.718 [request_id=2m48urtrikdbgfd4mg15uju1 ] [info] e[32mRESPONSE: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.718 [request_id=2m48urtrikdbgfd4mg15uju1 ] [info] Sent 200 in 37ms
2019-03-03 UTC 21:16:21.819 [request_id=2m48uru3rnt32gdmm015uk01 ] [info] POST /api/v1/bill/search
2019-03-03 UTC 21:16:21.819 [request_id=2m48uru3rnt32gdmm015uk01 ] [info] e[33mREQUEST: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2000 chars
2019-03-03 UTC 21:16:21.827 [request_id=2m48uru3rnt32gdmm015uk01 ] [info] e[32mRESPONSE: bill.search VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 3800 chars
2019-03-03 UTC 21:16:21.827 [request_id=2m48uru3rnt32gdmm015uk01 ] [info] Sent 200 in 7ms
2019-03-03 UTC 21:16:21.829 [request_id=2m48uru4fi039u5io815uk21 ] [info] POST /api/v1/customer/fetch
2019-03-03 UTC 21:16:21.829 [request_id=2m48uru4fi039u5io815uk21 ] [info] e[33mREQUEST: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 2600 chars
2019-03-03 UTC 21:16:21.862 [request_id=2m48uru4fi039u5io815uk21 ] [info] e[32mRESPONSE: customer.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 5500 chars
2019-03-03 UTC 21:16:21.862 [request_id=2m48uru4fi039u5io815uk21 ] [info] Sent 200 in 32ms
2019-03-03 UTC 21:16:21.870 [request_id=2m48uru6spgf3kfgmg15uk41 ] [info] POST /api/v1/order/fetch
2019-03-03 UTC 21:16:21.870 [request_id=2m48uru6spgf3kfgmg15uk41 ] [info] e[33mREQUEST: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 450 chars
2019-03-03 UTC 21:16:21.888 [request_id=2m48uru6spgf3kfgmg15uk41 ] [info] e[32mRESPONSE: order.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.889 [request_id=2m48uru6spgf3kfgmg15uk41 ] [info] Sent 200 in 18ms
2019-03-03 UTC 21:16:21.893 [request_id=2m48uru893scd1s48c15uk61 ] [info] POST /api/v1/bill_rate/fetch
2019-03-03 UTC 21:16:21.893 [request_id=2m48uru893scd1s48c15uk61 ] [info] e[33mREQUEST: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 1500 chars
2019-03-03 UTC 21:16:21.934 [request_id=2m48uru893scd1s48c15uk61 ] [info] e[32mRESPONSE: bill_rate.fetch VERY_LONG_ELIXIR_MAP_GOES_HERE. average length of line is 4000 chars
2019-03-03 UTC 21:16:21.934 [request_id=2m48uru893scd1s48c15uk61 ] [info] Sent 200 in 40ms

#7

i replaced following

if String.contains?(log_line, "POST /api/v1/#{endpoint}") do
   add_stats(endpoint)
end

with binary pattern matching

line = "POST /api/v1/#{endpoint}"
case x do
  <<_ :: binary-size(74), rest :: binary>> -> 
    case rest == line do
      true -> add_stats(endpoint)
      _ -> :ok
    end
  _ -> :ok  
end

so instead of searching through each line which can be a very long line. i search from specific line and it improved a lot of speed. now i get result in 573 seconds.

change commit is https://github.com/murtza/log-stats/commit/4703681a2f626c8c42dd0bd34e1b04a7a253fd84


#8

As most lines of the log aren’t relevant, would doing a first pass with grep or awk make things faster?


#9

Unlikely. With that preliminary step plus the Elixir code you’ll do two passes of rather huge files.

If you pattern match only the lines you need and ignore the rest while streaming the files in your Elixir code, you’ll only do one pass.


#10

try looking into https://github.com/plataformatec/nimble_parsec - try and find the video(s) where @josevalim uses it https://www.twitch.tv/josevalim/videos

it’s especially the defp endpoint_stats(x) do function I “dont like” - eg would like to see the pattern matching generated with a macro or nimble_parsec, instead of Enum.mapping/case matching through the many options each time…

but honestly I’m no expert here, and may have misunderstood the implementation…


#11

I don’t have your logs so it’s hard to try it out, but from what I understand pattern matching in the function head is heavily optimized by the compiler. You have two hot functions that could be simplified and possibly run much faster by removing the case and putting the pattern match in the head, endpoint_stats and sent_stats.

For example endpoint_stats

  defp endpoint_stats(x) do
    Enum.each(@endpoints, fn endpoint ->
      case x do
        <<_ :: binary-size(87), rest :: binary>> -> 
          case rest == endpoint <> "\n" do
            true -> add_stats(endpoint)
            _ -> :ok
          end
          _ -> :ok  
      end
    end)
  end

can be refactored into this, which is more idiomatic, removes the unnecessary case and probably runs faster. It also avoids running the match for every endpoint, which is pointless because x doesn’t change within endpoint_stats.

  defp endpoint_stats(<<_ :: binary-size(87), rest :: binary>>) do
    Enum.each(@endpoints, fn endpoint ->
      case rest == endpoint <> "\n" do
        true -> add_stats(endpoint)
        _ -> :ok
      end
    end)
  end

  defp endpoint_stats(_), do: :ok

And you can apply the same pattern to sent_stats.


#12

True, but not exactly, doing filtering with grep or rg before passing it to Elixir stream could improve performance, as Elixir’s stream still need to traverse data byte by byte after loading it to the memory as it need to search for \n bytes to split it into separate lines. grep and rg does whole thing in the reverse, so they load handful of file into memory, do regex lookup, and then, only if it matched, looks for start and end of line. So if there is more data to remove than it is to fetch, then whole process should be faster with grep then process.


#13

Yep, also very likely.

I’ve done big files parsing with Elixir alone in the past and while something like Go or Rust can easily outperform it, it is still quite amazing the speeds you can achieve with a good hand-crafted homemade algorithm while having much more readable code compared to a lot of languages out there.


#14

I totally agree, my point was that using simple filtration of data beforehand can speedup whole process. And even if in single run it will be slower then you need to do it once, so in repeatable task it would result with overall speedup.


#15

BTW, have you tried to use Flow? If your processing is CPU bound it can improve things a little (if it is IO bound, which shouldn’t be a case here, then there will be no speedup, it could even result with slowdown).


#16

If it’s a simple integral counter have you looked in the :counters module? It’s like a simplified :ets.update_counter but even much faster yet as it uses atomic intrinsics, it’s pretty new, came out in OTP 21 I think, so a half year or so ago. :slight_smile: