Dbg is a life-saver for text parsing

It’s turned into a killer feature for me. I write lots of parsing code, extracting info from laws and statutes. dbg’s ability to show intermediate values in a pipeline is amazing for helping me understand where my code went wrong.

E.g., here I’m trying to extract the Chapter range, 90 - 130 but I’ve got a bug.

    range_string =
      raw_string
      |> String.split("-", parts: 3)
      |> at(2)
      |> trim()
      |> String.split()
      |> at(1)
      |> String.split("(")
      |> at(0)
      |> dbg
[lib/crawlers/ors/parser.ex:52: Parser.extract_chapter_range/1]
raw_string #=> "Volume : 03 - Landlord-Tenant, Domestic Relations, Probate - Chapters 90-130 (36)"
|> String.split("-", parts: 3) #=> ["Volume : 03 ", " Landlord",
 "Tenant, Domestic Relations, Probate - Chapters 90-130 (36)"]
|> at(2) #=> "Tenant, Domestic Relations, Probate - Chapters 90-130 (36)"
|> trim() #=> "Tenant, Domestic Relations, Probate - Chapters 90-130 (36)"
|> String.split() #=> ["Tenant,", "Domestic", "Relations,", "Probate", "-", "Chapters",
 "90-130 (36)"]
|> at(1) #=> "Domestic"
|> String.split("(") #=> ["Domestic"]
|> at(0) #=> "Domestic"

Seeing this helped me quickly decide to just use a Regex.

15 Likes

Theres also nimble parsec GitHub - dashbitco/nimble_parsec: A simple and fast library for text-based parser combinators I found incredibly powerful

2 Likes

I’m getting really expressive and simple pipelines from these two helper functions I wrote. The basic idea is to make the String the first param; and just return the matches.

defmodule Crawlers.String do
  @moduledoc """
  Extensions to String.
  """

  @spec capture(binary, Regex.t()) :: binary | nil
  def capture(string, regex) do
    captures(string, regex) |> Enum.at(0)
  end

  @spec captures(binary, Regex.t()) :: [binary] | nil
  def captures(string, regex) do
    Regex.run(regex, string, capture: :all_but_first)
  end
end

An example from my client code:

  #
  # Extract the chapter range numbers from a Volume heading like this:
  #   "Volume : 01 - Courts, Oregon Rules of Civil Procedure - Chapters 1-55 (48)"
  #
  # to:
  #   ["1", "55"]
  #
  defp extract_chapter_range(raw_string) do
    raw_string
    |> captures(~r/Chapters (\w+)-(\w+)/u)
  end

I haven’t yet bought into combinators yet — I don’t find them very expressive. They seem very procedural.

FYI I think this is a fantastic idea — modular, expressive components. It’s like a parser language. GitHub - manoss96/pregex: PRegEx - Programmable Regular Expressions

Here’s a screenshot showing how dbg immediately helped me see why my Regex was failing: