Guards or pattern matching processing a text file - =~

I am trying to process a text file like a kind of plain text guitar tab with etc. a bit like the Ultimate Guitar Tab site.
Each line will either blanks or chords or lyrics or something like that.

defmodule TextFileProcessor do
  def process_file(file_path) do
    File.stream!(file_path)
    |> Enum.each(&process_line/1)
  end

  defp process_line(line) when line =~ ~r/^LYRICS:/ do
    # Process lines starting with "LYRICS:"
    IO.puts("Processing lyrics line: #{line}")
  end

  defp process_line(line) when line =~ ~r/^CHORDS:/ do
    # Process lines starting with "CHORDS:"
    IO.puts("Processing chords line: #{line}")
  end

  defp process_line(line) do
    # Default processing for other lines
    IO.puts("Processing line: #{line}")
  end
end

I can’t use =~ in a guard or other Kernel functions is there a better way to do this?

I hope there is enough in here.

TIA

You can do this with pattern matching in the function args.

defmodule TextProcessor do
  def process_string(string) do
    string
    |> String.split("\n")
    |> Enum.each(&process_line/1)
  end

  defp process_line("LYRICS:" <> rest) do 
    # Process lines starting with "LYRICS:"
    IO.puts("Processing lyrics line: #{rest}")
  end

  defp process_line("CHORDS:" <> rest) do 
    # Process lines starting with "CHORDS:"
    IO.puts("Processing chords line: #{rest}")
  end

  defp process_line(line) do
    # Default processing for other lines
    IO.puts("Processing line: #{line}")
  end
end

TextProcessor.process_string("""
LYRICS: foo 
CHORDS: bar
foobar\
""")
Processing lyrics line:  foo 
Processing chords line:  bar
Processing line: foobar
1 Like

Thank you Marcus, that is very nice, is there any more sophisticated mechanism than <> like regular expressions for example

Not in guards/function heads. You’ll need to use a single function with something like a cond inside.

3 Likes

There are constraints around guards, which prevent many higher level/more complex apis from being usable:

Not all expressions are allowed in guard clauses, but only a handful of them. This is a deliberate choice. This way, Elixir (and Erlang) can make sure that nothing bad happens while executing guards and no mutations happen anywhere. It also allows the compiler to optimize the code related to guards efficiently.

https://hexdocs.pm/elixir/patterns-and-guards.html

1 Like

Here’s my take on it:

"""
LYRICS: L1
CHORDS: C1
LYRICS: L2
CHORDS: C2
CHORDS: C3
CHORDS: C4
CHORDS: C5
CHORDS: C6
TEMPO: T1
foobar\
"""
|> String.split("\n")
|> Enum.map_reduce(%{}, fn line, acc ->
  hd = line |> String.split(":") |> hd()

  {line, if(hd in ["LYRICS", "CHORDS", "TEMPO"]) do
      Map.update(acc, hd, [line], fn arr -> [line | arr] end)
  else
      Map.update(acc, "REST", [line], fn arr -> [line | arr] end)
  end}
end)
|> then(fn {_lines, acc} -> acc end)

or if you don’t care about specifics in the first iteration, you can do:

"""
LYRICS: L1
CHORDS: C1
LYRICS: L2
CHORDS: C2
CHORDS: C3
CHORDS: C4
CHORDS: C5
CHORDS: C6
TEMPO: T1
foobar\
"""
|> String.split("\n")
|> Enum.map_reduce(%{}, fn line, acc ->
  hd = line |> String.split(":") |> hd() || "REST"

  {line, Map.update(acc, hd, [line], fn arr -> [line | arr] end)}
end)
|> then(fn {_lines, acc} -> acc end)

Both the pipelines, spit out an organized map, which you can then process.

%{
  "CHORDS" => ["CHORDS: C6", "CHORDS: C5", "CHORDS: C4", "CHORDS: C3", "CHORDS: C2", "CHORDS: C1"],
  "LYRICS" => ["LYRICS: L2", "LYRICS: L1"],
  "REST" => ["foobar"],
  "TEMPO" => ["TEMPO: T1"]
}

If you don’t want that map, you can directly process it within the pipeline above.


P.S.

  1. Instead of split, you can use regex.
  2. Instead of if-else, you can use cond.

Personally, I prefer this way because it makes it composable and allows me to use dbg() to see if anything in the pipeline is not working as expected.

1 Like

Thank you all so much for the help

The actual song will likely be in this format plus others like [Chords} to define the kinds of chords to use and other metadata

    [Am]     [F]{2}          [E]
Tra-cy's got killer's hands.

    [Am]     [F]          [E]
Hands that used to be a man's.

[Am]     [F]          [E]
Tra-cy's hands a are big and hairy.

It matters where along the line the chord def, the text will come from non-tech people so I am expecting a fair amount of handling, things like duplicate lines or mistakes etc.

The idea is to get a nice html render out of it

If anyone is interested I think I kind of got to where I wanted like this:

defmodule Textprocessor do
  @regex_find_chords ~r/\b(?:[BE]b?|[ACDFG]#?)(?:sus|m|maj|min|[-1-9\/m])?(?:sus|m|maj|min|[-1-9\/m])?\b#?/
  @empty_line ~r/^\s*$/
  @instruction ~r/^\[[\w+ ]+\]\s*$/

  @moduledoc """
  Documentation for `Textprocessor`.
  """

  @doc """
  Process song

  ## Examples
    # iex> Textprocessor.process_file("./lyrics.text")
    # :ok
  """

  def process_file(filename) do
    File.stream!(filename)
    |> Enum.map(&String.trim/1)
    |> Enum.each(&process_line/1)
  end

  defp process_line(line) do
    cond do
      String.match?(line, @regex_find_chords) ->
        IO.puts("CHORDS: #{line}")

      String.match?(line, @instruction) ->
        IO.puts("INSTRUCTION: #{line}")

      String.match?(line, @empty_line) ->
        IO.puts("EMPTY: #{line}")

      true ->
        IO.puts("LYRICS: #{line}")
        :ok
    end
  end
end

Not sure this is optimal but I would be interested in any criticism

2 Likes

I am pretty late here but I’d advise against regexes unless they are very straightforward. Haven’t looked into yours in details (and length is not always an indicator of a complex regex) but I’d probably reach for nimble_parsec and give it a go for a day or two.

That being said, being productive in that particular library is a skill in and of itself so if you are pressed for time you should probably keep your regex solution but also add edge case unit tests.

4 Likes

Thanks Dimitar

I think the likely size of the files is going to be pretty small, in the order 100 lines so hopefully regex will do for now.

I will see how I go with it before attempting nimble_parsec.

I am very much at the beginning of my elixir journey.