Get complete substrings from a string

script · March 16, 2022, 6:31am

Hi
I have this html string

     "<div class=wp-block-supertrends-figure><div style=overflow: hidden; padding-top: 
      56.25%; position: relative;><iframe src=https://player.vimeo.com/video/687889699 
      frameborder=0 allowfullscreen=></iframe></div><div class=title>Video</div><div 
      class=text>Video description</div></div>"

I need to extract https://player.vimeo.com/video/687889699 from it.
I need a Regex that matches on http as a starting character and get the entire url because there can be more links with different urls so I don’t want to hardcode this particular url and Regex is not my strong suit and I want to do it with Regex instead of using multiple String.split.
Any help would be appreciated.
Thank you.

mindok · March 16, 2022, 7:07am

A well-structured HTML parser, for example: GitHub - philss/floki: Floki is a simple HTML parser that enables search for nodes using CSS selectors., may be a more reliable option.

stefanluptak · March 16, 2022, 9:01am

Naive approach:

str = """
<div class=wp-block-supertrends-figure><div style=overflow: hidden; padding-top: 
56.25%; position: relative;><iframe src=https://player.vimeo.com/video/687889699 
frameborder=0 allowfullscreen=></iframe></div><div class=title>Video</div><div 
class=text>Video description</div></div>
"""

Regex.run(~R|https://player.vimeo.com/video/[0-9]+|, str)
# ["https://player.vimeo.com/video/687889699"]

EDIT: Sorry, I just realized, there can be more URL formats. So consider this irrelevant.

fuelen · March 16, 2022, 9:14am

iex> Regex.scan(~r/src=(\S+)/, string <> string, capture: :all_but_first)
[
  ["https://player.vimeo.com/video/687889699"],
  ["https://player.vimeo.com/video/687889699"]
]