Can I make this pattern matching more "elegant"

Hello!

Playing with Elixir, I’m wondering if I could make the following “look ahead” pattern matching a bit “better”, that is, a bit less verbose.

I’m not interested in using a regex btw.

# echo ./pattern.ex | entr -c elixir /_

ExUnit.start()

defmodule Pattern do
  def filter_map(str) do
    filter_map(str, [])
  end

  defp filter_map(<<>>, acc), do: Enum.reverse(acc)

  defp filter_map(<<n, rest::binary>>, acc) when n in ?0..?9 do
    filter_map(rest, [n - ?0 | acc])
  end

  defp filter_map(<<?o, ?n, ?e, rest::binary>>, acc) do
    filter_map(<<?n, ?e, rest::binary>>, [1 | acc])
  end

  defp filter_map(<<?t, ?w, ?o, rest::binary>>, acc) do
    filter_map(<<?w, ?o, rest::binary>>, [2 | acc])
  end

  defp filter_map(<<?t, ?h, ?r, ?e, ?e, rest::binary>>, acc) do
    filter_map(<<?h, ?r, ?e, ?e, rest::binary>>, [3 | acc])
  end

  defp filter_map(<<?f, ?o, ?u, ?r, rest::binary>>, acc) do
    filter_map(<<?o, ?u, ?r, rest::binary>>, [4 | acc])
  end

  defp filter_map(<<?f, ?i, ?v, ?e, rest::binary>>, acc) do
    filter_map(<<?i, ?v, ?e, rest::binary>>, [5 | acc])
  end

  defp filter_map(<<?s, ?i, ?x, rest::binary>>, acc) do
    filter_map(<<?i, ?x, rest::binary>>, [6 | acc])
  end

  defp filter_map(<<?s, ?e, ?v, ?e, ?n, rest::binary>>, acc) do
    filter_map(<<?e, ?v, ?e, ?n, rest::binary>>, [7 | acc])
  end

  defp filter_map(<<?e, ?i, ?g, ?h, ?t, rest::binary>>, acc) do
    filter_map(<<?i, ?g, ?h, ?t, rest::binary>>, [8 | acc])
  end

  defp filter_map(<<?n, ?i, ?n, ?e, rest::binary>>, acc) do
    filter_map(<<?i, ?n, ?e, rest::binary>>, [9 | acc])
  end

  defp filter_map(<<_, rest::binary>>, acc) do
    filter_map(rest, acc)
  end
end

defmodule PatternTest do
  use ExUnit.Case

  test "extract digits" do
    assert Pattern.filter_map("1.abc.oneight--nineightseveninessixfivefourthreetwone0123456789") ==
             [
               1,
               1,
               8,
               9,
               8,
               7,
               9,
               6,
               5,
               4,
               3,
               2,
               1,
               0,
               1,
               2,
               3,
               4,
               5,
               6,
               7,
               8,
               9
             ]
  end
end

A little improvement over would be:

defp filter_map("one" <> rest, acc) do
  filter_map("ne" <> rest, [1 | acc])
end

If you have a lot of numbers, you could always generate these function heads using metaprogramming (if you have only these 9 digits, writing them by hand is much more readable):

@digits = %{
  "one" => 1,
  "two" => 2,
  "three" => 3
}

for {str, digit} <- @digits do
    def filter_map(unquote(str) <> rest, acc) do
       filter_map(String.slice(unquote(str), 1..-1) <> rest, [unquote(digit) | acc])
    end
end

I didn’t execute this second approach code, so it might not compile, but the idea should be clear, you generate functions based on your map of string digits.

1 Like

If this is for advent of code I had this:

  defp collect_digits(<<c, rest::binary>>, acc) when c in ?0..?9,
    do: collect_digits(rest, [c - ?0 | acc])

  ~w(one two three four five six seven eight nine)
  |> Enum.with_index(1)
  |> Enum.each(fn {<<c, keep::binary>>, digit} ->
    defp collect_digits(<<unquote(c), unquote(keep)::binary, rest::binary>>, acc) do
      collect_digits(<<unquote(keep)::binary, rest::binary>>, [unquote(digit) | acc])
    end
  end)

  defp collect_digits(<<_, rest::binary>>, acc), do: collect_digits(rest, acc)
  defp collect_digits(<<>>, acc), do: :lists.reverse(acc)
1 Like

What day was this?

Smells like 2023/1 to me.

Edit: Why didn’t I think of doing it this way before :man_facepalming:

1 Like

In that case, lots of suggestions here:

1 Like

I did Scala for first few days of 2023 before getting frustrated and crawling back to Elixir for later days, that’s why I don’t recall visiting the thread. Thanks for sharing, AoC threads have always taught me plenty.

1 Like

Busted :slight_smile:

Yeah, I wanted to see if I could replicate something close to the following OCaml expression:

let rec filter_nums lst =
    match lst with
    | [] -> []
    | ('0' .. '9'  as h)  :: t                     ->  h  :: filter_nums t
    |  'o' :: ('n' :: 'e' :: _ as t)               -> '1' :: filter_nums t
    |  't' :: ('w' :: 'o' :: _ as t)               -> '2' :: filter_nums t
    |  't' :: ('h' :: 'r' :: 'e' :: 'e' :: _ as t) -> '3' :: filter_nums t
    |  'f' :: ('o' :: 'u' :: 'r' :: _ as t)        -> '4' :: filter_nums t
    |  'f' :: ('i' :: 'v' :: 'e' :: _ as t)        -> '5' :: filter_nums t
    |  's' :: ('i' :: 'x' :: _ as t)               -> '6' :: filter_nums t
    |  's' :: ('e' :: 'v' :: 'e' :: 'n' :: _ as t) -> '7' :: filter_nums t
    |  'e' :: ('i' :: 'g' :: 'h' :: 't' :: _ as t) -> '8' :: filter_nums t
    |  'n' :: ('i' :: 'n' :: 'e' :: _ as t)        -> '9' :: filter_nums t
    |   _  :: t -> filter_nums t

I hadn’t considered meta-programming, interesting.

Got it, I’ll look around, and great, I kinda missed that we could in fact advance a few more steps (feeding back that last letter is enough).

The following functions seem to be equivalent:

  defp filter_map(<<?o, ?n, ?e, rest::binary>>, acc) do
    filter_map(<<?e, rest::binary>>, [1 | acc])
  end

  defp filter_map(<<"one", rest::binary>>, acc) do
    filter_map(<<"e", rest::binary>>, [1 | acc])
  end

  defp filter_map("one" <> rest, acc) do
    filter_map("e" <> rest, [1 | acc])
  end

But my understanding is a little fuzzy. I would tend to think that the string concatenation would be less “efficient” but is that really so? At the function call level? At the pattern match level? What happens “under the hood” for these cases?

Of course I’m being completely impractical and disregarding the root of all evil here :slight_smile:

In a pattern match <<"one", rest::binary> and "one" <> rest are strictly equivalent, they compile to the same bytecode.

In a “normal” expression, so outside of a pattern match, as long as "one" is a litteral I guess it is also equivalent, though I am not sure.

Kinda, there is a little compile-time overhead since you can chain the <> operator. For example these will be the AST you will be getting:

quote do
  <<"one", rest, "abc">>
end

# result
{:<<>>, [], ["one", {:rest, [], Elixir}, "abc"]}

quote do
  one <> rest <> "abc"
end

# result
{:<>, [context: Elixir, imports: [{2, Kernel}]],
 [
   {:one, [], Elixir},
   {:<>, [context: Elixir, imports: [{2, Kernel}]],
    [{:rest, [], Elixir}, "abc"]}
 ]}

The final representation becomes the same <<>>, here is the source code of <> operator:

defmacro left <> right do
    concats = extract_concatenations({:<>, [], [left, right]}, __CALLER__)
    quote(do: <<unquote_splicing(concats)>>)
  end
2 Likes