Pattern matching against a string

I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor do a pattern match against a string. This is exactly the first step i needs to take for my first phoenix project.

in my project, I am receiving a ton of strings that look like:

<li>Kiwi Jr. (33)</li>
<li>Deep State (29)</li>
<li>Piroshka (29)</li>

where he did something like:

iex(1)> string = "<li>Kiwi Jr. (33)</li>"
"<li>Kiwi Jr. (33)</li>"
iex(2)> "<li>"<>artist<>"("<>playcount<>"</li>" = string
** (ArgumentError) the left argument of <> operator inside a match should be always a literal binary as its size can't be verified, got: artist

obviously, i am missing something … I would like to end up with:

artist = Kiwi Jr.
play count = 33

anyone see where i am putting it in the ditch?
thanks!

Unless you know exactly how many characters your artist has you’ll need to resort to using regex or some other tool for parsing the string. Pattern matches cannot do what you seem to be looking for.

That error is complaining about this part of your code specifically:

artist <> "("

The way matching on binaries works is that:

  • it can match a binary literal like “(” because it knows its size
  • it can bind to a variable, where the length is specified
  • it can bind the rest of the binary to a variable, when it appears at the end of the match

Because artist and playcount are not at the end of the match, it’ll fail because you’re don’t match the second or third rule.

Hence @LostKobrakai’s suggestion

3 Likes

Almost true :slight_smile:

defmodule Matcher do
  for artist_len <- 1..100, num_len <- 1..10 do
    def li(<<
            "<li>",
            artist :: binary-size(unquote(artist_len)),
            " (",
            num :: binary-size(unquote(num_len)),
            ")</li>"
        >>), do: {artist, num}
  end


    # last resort clause
    def li(input),
      do: Regex.scan(
        ~r"<li>(.*?)\s*\((.*?)\)</li>", input,
        capture: :all_but_first
      )
end

Matcher.li("<li>Kiwi Jr. (33)</li>")
#⇒ {"Kiwi Jr.", "33"}

In 99% of cases it would go through pattern match, making the code faster than Regex.

okay, this is assuming that the data matches the length requirement in line 2?

in the example i saw, it was something simple like:

iex(6)> s = "categories:1"
"categories:1"
iex(7)> "categories:"<>index=s
"categories:1"
iex(8)> index
"1"

I’ll give this a shot.

Thanks!

Nope. The code above generates 1001 functions handling all possible combinations of lengths for artist and num in the intervals 1..100 and 1..10 respectively. Plus one sink-all clause in the lengths are not in these intervals.

One cannot pattern-match the binary of arbitrary length in the middle, but a match to the binary of the explicit length is allowed.

Close, but wrong :wink: it generates a single function with 1001 heads.

1 Like

Indeed. We call it Matching Dragon.

5 Likes

okay… so, my next question. is there something more “functional” about doing it this way, rather than a straight pattern match? i have spent the past 30 years in the OOP world. I used Lisp maybe 30 years ago, but didn’t know enough to really make the distinction back then.

I know I initially asked about doing it with a pattern match, just because i saw that in a course, and EVERY TIME i need to do regex, i need to look at the docs…

Thanks!

You probably meant “rather than a straight regex.” Well, it depends . In most cases Regex is just fine. Also, if you are after parsing the (contrived example) ISO8601 representation of a date, you might extract year, month and day straight away:

<<
    year :: binary-size(4), "-",
    month :: binary-size(2), "-",
    day :: binary-size(2)>> = "2019-03-20"
year
#⇒ "2019"

There is no silver bullet.

1 Like

If these are guaranteed to be small HTML pieces I’d parse them with Floki or Meeseks and then apply a simpler regex on the text to get the two pieces of data you require.

Regex for HTML or XML is a hard “NO!” even if you do a two-days educational throwaway project.

6 Likes

SNAP! okay… Floki looks like the jam! this can be parsed much cleaner, as there are a bazillion lines in this fie… all LIs…

Thanks!

Here’s some details on how to easily parse HTML with regexes:

1 Like

That is horrifyingly beautiful, lol! ^.^

This This This!

Using Meeseeks if you want to read it or Floki if you want to write it or something instead.

Ah the classic. ^.^

2 Likes

I ended up using Floki… took 2… maybe 3 seconds…

1 Like