How to properly parse a list of lines, with ‘look-ahead’ functionality?
I have been struggling with this for a long time, and it is maybe not even strictly Elixir related, but that is the language I am implementing it in, so I thought this forum would be the best place to start my quest. I am quite sure for a lot of you my question might be trivial.
The essence is just that I need to grasp how to tackle this parsing problem. Note that I prefer to do it all with Elixir code and would like to stay away from yecc
and leex
for now.
Here it goes: I have a rather simple log file with some lines in it, that I want to parse. The log has information about a list of our web projects that we ‘scan’ every day for issues or when we launch them, as an extra check. I succeeded in the first step which is transforming the raw text lines into something a bit more meaningful. This is an example of the raw log:
[ ] URL: www.example.com
[ ] Started: Mon May 1 22:20:01 2017
[ ] robots.txt file found
[!] robots.txt exposes too much
[ ] correct utf8 meta tag found
[ ] correct og meta tags found
[!] missing correct doctype
[ ] Server info
| Server: Apache
| Version: 2.2
| Lang: PHP
| php-fpm: yes
[!] server exposes too much
| exposing via response headers
I was able to transform the above into the following on a first pass:
[
{:site, "www.example.com"},
{:started, _parsed_erlang_date_},
{:block, "robots.txt file found"},
{:warn, "robots.txt exposes too much"},
{:block, "correct utf8 meta tag found"},
{:block, "correct og meta tags found"},
{:warn, "missing correct doctype"},
{:pipe, "Server: Apache"},
{:pipe, "Version: 2.2"},
{:pipe, "Lang: PHP"},
{:pipe, "php-fpm: yes"},
{:warn, "server exposes too much"},
{:pipe, "exposing via response headers"}
]
This looked pretty good and more structured, but now I am stuck… As you can derivate from the raw log, some information spans multiple lines that I would like to group in a struct instead of have it in separate entries in the resulting list.
For example, the ‘Server info’ block should be one {:server, {_server_props_or_info_lines}}
list. And the last warning should be combined with the line after it which contains some warning meta.
And that is where I get stuck. I think it is a programming paradigm (I am new to FP, have been doing imperative for +12 years) that I need to grasp and it is not tied to Elixir at all. I hope somebody can guide me through this; but bear in mind: I have no CS Masters degree so I do not know that much about parsers and lexers. It is just that the thing I try to do seems so so trivial, and it frustrates me that I cannot get it right.
Thank you, anyone, in advance!