Fastest way to traverse a log file

I’m trying to parse a big log file generated by a computer. the computer generated status logs, and every time someone attempts an activity, it creates logs of this as well.This starts by making an “ACTIVITY START” log entry. Once the activity ends, it adds “ACTIVITY END” in the log file. All logs are printed line by line.

I am trying to pick these ACTIVITY chunks between ACTIVITY START and ACTIVITY END.
Right now I was able to get these logs out with a reduce function.

Example Log File

14:23:00 MACHINE OKAY
14:24:00 MACHINE OKAY
14:24:52 ACTIVITY START
14:24:52 CD ROM WORKING ID: 12345
14:24:55 PASSWORD ENTERED
14:25:00 PID 1023 ENTERED
14:25:00 PASSWORD OK NO ACTION NEEDED
14:25:00 NETWORK REQUEST FXX     
14:25:02 NETWORK REPLY FXX ID 2456
MACH IBM                  
DATE          TIME    MACH ID
11/11/19      11:11   MACH1234   
USER IDENTIFICATION   SASASAS  
DATA NOT RECEIVED     
14:25:07 ACTIVITY END
14:26:00 MACHINE OKAY
14:27:00 MACHINE OKAY
|> File.stream!()
    |> Enum.map(&String.trim/1)
    |> Enum.reduce([""], fn next, accum ->
      if String.contains?(next, "ACTIVITY START") do
          [next] ++ accum
      else
          [head | tail] = accum
          [head <> "\n" <> next | tail]
      end
    end)
    |> Enum.reverse()

Is there a better way of doing this?

:wave:

Not sure if relevant but maybe you can use https://github.com/plataformatec/nimble_parsec

Or check out How to properly parse a list of lines, with 'look-ahead' functionality? for discussion on a similar topic.

1 Like

You can also chunk your stream in smaller part… for example, matching blocks ending with ACTIVITY END.

chunk_fun = fn item, acc ->
  if String.ends_with?(item, "ACTIVITY END\n") do
    case acc do
      "" -> {:cont, item, ""}
      previous -> {:cont, previous <> item, ""}
    end
  else
    {:cont, acc <> item}
  end
end

after_fun = fn
  "" -> {:cont, ""}
  acc -> {:cont, acc, ""}
end

f 
|> File.stream! 
|> Stream.chunk_while("", chunk_fun, after_fun) 
|> Enum.to_list

["14:23:00 MACHINE OKAY\n14:24:00 MACHINE OKAY\n14:24:52 ACTIVITY START\n14:24:52 CD ROM WORKING ID: 12345\n14:24:55 PASSWORD ENTERED\n14:25:00 PID 1023 ENTERED\n14:25:00 PASSWORD OK NO ACTION NEEDED\n14:25:00 NETWORK REQUEST FXX     \n14:25:02 NETWORK REPLY FXX ID 2456\nMACH IBM                  \nDATE          TIME    MACH ID\n11/11/19      11:11   MACH1234   \nUSER IDENTIFICATION   SASASAS  \nDATA NOT RECEIVED     \n14:25:07 ACTIVITY END\n",
 "14:26:00 MACHINE OKAY\n14:27:00 MACHINE OKAY"]

You still need to process each chunk… but this will capture until ACTIVITY_END, or EOF