Capturing regex matching variables in cond

I have this function for parsing a string into a tuple. "-<number>" should return {:lat, -<number>}, "+<number>" should return {:lat, <number>}, and "<number>" should return {:gridNum, }`, plus some other simpler matches:

defp parseIndex(s) do
      cond do
        String.match?(s, ~r/^-\d+$/) ->
          [_, n] = Regex.run(~r/^-(\d+)$/, s)
          {:lat, -1 * String.to_integer(n)}
        String.match?(s, ~r/^\+\d+$/) ->
          [_, n] = Regex.run(~r/^\+(\d+)$/, s)
          {:lat, String.to_integer(n)}
        s == "--" ->
          {:latFrame, -1}
        s == "++" ->
          {:latFrame, 1}
        s == "-*" ->
          {:latEnd, -1}
        s == "+*" ->
          {:latEnd, 1}
        match?({_n, ""}, Integer.parse(s)) ->
          {:gridNum, String.to_integer(s)}
        true ->
          {:gridText, s}
      end
    end

This works, but is there a better way to do this without running the regex twice, once to match a pattern like -<number> and then again to extract the numerical portion? Same for the second-to-last clause, which ends up parsing the integer twice.

Thanks!

Phil

Noting that:

  • Regex.run/2 returns nil if there is no match and that
  • nil is falsy for the purposes of boolean evaluation and
  • you can bind a variable in a match

For an optimisation I would likely put the explicit equality checks first since a cond proceeds in lexical order. Also I think you can collapse the :lat parsing into a single clause. And one last one, you can match and bind on the integer parsing too (the second last clause):

defp parseIndex(s) do
  cond do
    s == "--" ->
      {:latFrame, -1}
    s == "++" ->
      {:latFrame, 1}
    s == "-*" ->
      {:latEnd, -1}
    s == "+*" ->
      {:latEnd, 1}
    (match = Regex.run(~r/^([+-]\d+)$/, s)) -> 
      [_, n] = match
      {:lat, String.to_integer(n)}
    match?({n, ""}, Integer.parse(s)) ->
      {:gridNum, n}
    true ->
      {:gridText, s}
  end
end
3 Likes

Thanks! I’d tried

[_, n] = Regex.run(~r/^-(\d+)$/, s) ->
  {:lat, -1 * String.to_integer(n)}

But when the Regex fails [_, n] doesn’t match, causing an error. Your use of the match variable solves that beautifully.

However, the second-to-last clause doesn’t work, because variables within the match?/2 call aren’t available outside. I guess I could use Regex to detect a string of only digits and then use String.to_integer/1, that’s probably slightly faster than parsing the string to an integer twice.

And thanks, also, for pointing out the simple optimization of moving the static clauses to the top.

Phil

This is how I’d do it:

  def parseIndex("--"), do: {:latFrame, -1}
  def parseIndex("++"), do: {:latFrame, -1}
  def parseIndex("-*"), do: {:latEnd, 1}
  def parseIndex(s) do
    case Regex.run(~r/^([+-])?(\d+)$/, s) do
      [_, "", n] -> {:gridNum, String.to_integer(n)}
      [_, "-", n] -> {:lat, -1 * String.to_integer(n)}
      [_, _, n] -> {:lat, String.to_integer(n)}
      nil -> {:gridText, s}
    end
  end

By adding groups to the sign part and the remaining number and inspecting the captured sign (or absence of it), we can remove the need to a check to Integer.parse.

[Update: I had missed the -1 multiplier]

What about doing binary matching directly?

defmodule Hello do
  def run(string) do
    do_run(string)
  end
  
  defp do_run("--"), do: {:latFrame, -1}
  defp do_run("++"), do: {:latFrame, 1}
  defp do_run("-*"), do: {:latEnd, -1}
  defp do_run("+*"), do: {:latEnd, 1}

  defp do_run(<<"+", number::binary>>), do: {:lon, to_integer(number)}
  defp do_run(<<"-", number::binary>>), do: {:lat, -to_integer(number)}
  defp do_run(<<"+", number::binary>>), do: {:lon, to_integer(number)}
  defp do_run(number), do: {:gridNum, to_integer(number)}
  
  defp to_integer(string), do: String.to_integer(string)
end
2 Likes

Lol, I was just thinking about that! I love a set of declarative one-liners where you can scan from top to bottom and pin point the right clause when you see it.

This would raise an error in case of a string though, never matching a {:gridText, text} result.

I like how this avoids the Regex (not that I have anything against Regexes) but <<"+", number::binary>> doesn’t constrain number to be a sequence of digits. It would try to parse something like "+q", which should result in {:gridText, "+q"}.

I like the solution with multiple functions, with a single Regex in the last one after handling all the simple cases. This is the best solution so far, thanks!

1 Like

Here you have it with error handling

defmodule Hello do
  def run(string) do
    {:ok, do_run(string)}
  catch
    :throw, error -> {:error, error}
  end
  
  defp do_run("--"), do: {:latFrame, -1}
  defp do_run("++"), do: {:latFrame, 1}
  defp do_run("-*"), do: {:latEnd, -1}
  defp do_run("+*"), do: {:latEnd, 1}

  defp do_run(<<"+", number::binary>>), do: {:lon, to_integer(number)}
  defp do_run(<<"-", number::binary>>), do: {:lat, -to_integer(number)}
  defp do_run(<<"+", number::binary>>), do: {:lon, to_integer(number)}
  defp do_run(number), do: {:gridNum, to_integer(number)}
  
  defp to_integer(string) do
    case Integer.parse(string) do
      {num, ""} -> num
      _ -> throw({:not_number, string})
    end
  end
end
1 Like