Parsing the output of a file

shaolingeek · October 31, 2018, 10:54pm

Hi all,

I have a file that includes a lot of modular output as follows:

------------- show clock -------------

Tue May 17 11:39:52 2016
Timezone: UTC
Clock source: local

------------- show ntp status -------------

unsynchronised
  time server re-starting
   polling server every 8 s


------------- show logging -------------

Is there anyway to pattern match File.read output to bind all of the data between one “------------- show” and the next?

e.g.: “show ntp status” to all of the output that is between:

“------------- show ntp status -------------”

and

“------------- show logging -------------”

So for instance having a variable called ntp_status that returns:

------------- show ntp status -------------

unsynchronised
time server re-starting
polling server every 8 s

Thanks for any insight that you can provide.

Rich

yawaramin · October 31, 2018, 11:37pm

I don’t see a way to pattern match to get the specific section, but it should be possible to use String.split. It would be easier if each named section occurs only once in the file and you can load the entire file as a string. E.g.:

def section(input, header_name) do
  [_before, during] = String.split(input, "------------- show #{header_name} -------------")
  [during, _after] = String.split(during, "------------- show ")

  String.trim(during)
end

amarraja · November 1, 2018, 9:44am

Here is a very rough version. The idea is to iterate through the lines with two accumulators, one for the current block and one for the whole file. On seeing a line marker, create a new sub-accumulator and append lines to it until we reach the next marker. Sub accumulators get appended to the main accumulator as each block ends.

defmodule Parser do
  @line_marker "-------------"

  def test do
    lines = ~s"""
    ------------- show clock -------------

    Tue May 17 11:39:52 2016
    Timezone: UTC
    Clock source: local

    ------------- show ntp status -------------

    unsynchronised
      time server re-starting
       polling server every 8 s
    """

    parsed =
      lines
      |> String.split("\n")
      |> parse()

    Enum.each(parsed, fn {name, content} -> IO.puts("#{name}\n#{content}") end)
  end

  def parse(lines) do
    parse(lines, nil, [])
  end

  def parse([], current, acc) do
    [current | acc]
    |> Enum.reverse()
    # first element will be nil
    |> tl()
  end

  def parse([@line_marker <> " show " <> rest | tl], current_block, acc) do
    block_name = String.replace(rest, " #{@line_marker}", "")
    parse(tl, {block_name, ""}, [current_block | acc])
  end

  def parse([h | tl], {block_name, lines}, acc) do
    parse(tl, {block_name, lines <> h <> "\n"}, acc)
  end
end

Sample output:

clock

Tue May 17 11:39:52 2016
Timezone: UTC
Clock source: local


ntp status

unsynchronised
  time server re-starting
   polling server every 8 s

alco · November 1, 2018, 10:10am

@amarraja’s solution is great if you combine it with streaming the file line by line via File.stream!(path).

Here’s a different approach that is more efficient if you have the whole file loaded in memory:

defmodule Stringmatch do
  @section_prefix "------------- show "

  def data do
    """
    ------------- show clock -------------

    Tue May 17 11:39:52 2016
    Timezone: UTC
    Clock source: local

    ------------- show ntp status -------------

    unsynchronised
      time server re-starting
       polling server every 8 s


    ------------- show logging -------------
    """
  end

  def extract_section(data, section) do
    # Find the start index of the section we're interested in
    with {:ok, section_index} <- find_pattern_starting_at(data, @section_prefix <> section, 0) do
      # Find the start index of the next section and slice the binary just before it
      next_section_index =
        case find_pattern_starting_at(data, @section_prefix, section_index + 1) do
          {:ok, next_section_index} -> next_section_index
          {:error, :not_found} -> byte_size(data)
        end

      section_length = next_section_index - section_index
      {:ok, :binary.part(data, section_index, section_length)}
    end
  end

  defp find_pattern_starting_at(data, pattern, start_index) do
    case :binary.match(data, pattern, scope: {start_index, byte_size(data) - start_index}) do
      :nomatch -> {:error, :not_found}
      {match_index, _length} -> {:ok, match_index}
    end
  end
end

Usage examples:

iex(1)> Stringmatch.extract_section(Stringmatch.data, "clock")
{:ok,
 "------------- show clock -------------\n\nTue May 17 11:39:52 2016\nTimezone: UTC\nClock source: local\n\n"}

iex(2)> Stringmatch.extract_section(Stringmatch.data, "ntp")
{:ok,
 "------------- show ntp status -------------\n\nunsynchronised\n  time server re-starting\n   polling server every 8 s\n\n\n"}

iex(3)> Stringmatch.extract_section(Stringmatch.data, "log")
{:ok, "------------- show logging -------------\n"}

iex(4)> Stringmatch.extract_section(Stringmatch.data, "foo")
{:error, :not_found}