Validating youtube link

nishanthg92 · July 22, 2017, 7:17am

How to validate different kind of a youtube link?
for example:

http://youtu.be/dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/?v=dQw4w9WgXckQ

and convert this link to embed link like http://www.youtube.com/embed/dQw4w9WgXcQ to embed video in our application?

wmnnd · July 22, 2017, 7:26am

YouTube links can easily be matched with regular expressions.

Check out this introduction to regular expressions if you’re not familiar with them yet:
http://www.regular-expressions.info/quickstart.html

If you want to learn how to use regular expressions in Elixir, take a look at the documentation of the Regex module which has a nice introduction to using them and detailed information about the individual methods:
https://hexdocs.pm/elixir/Regex.html

NobbZ · July 22, 2017, 8:06am

If the video IDs are guaranteed to be at the end, you can do a pattern match, which is probably a lot faster:

def get_video_id("http://youtu.be/" <> id), do: id
def get_video_id("http://www.youtube.com/watch?v=" <> id), do: id
def get_video_id("http://www.youtube.com/?v=" <> id), do: id

How you can append that video ID to the string "http://www.youtube.com/embed/" is left as an exercise.

I’d prefer to make it HTTPS ready and solve it via meta programming, roughly like this:

Enum.each(["http", "https"], fn protocol ->
  Enum.each(["youtu.be/", "www.youtube.com/watch?v=", "www.youtube.com/?v="], fn prefix ->
    def get_video_id(unquote(protocol <> "://" <> prefix) <> id), do: id
  end)
end)

But probably the best version is to properly parse the URL and extract the video ID from the parsed URL. Pattern matching as shown by me will not work correctly when there are other parameters in the query string, and regular expressions that cover those possibilities tend to grow into an unreadable and slow monstrosity.

yurko · July 22, 2017, 9:08am

Also Erlang has a handy module http_uri, here’s how you use it

 case url |> String.to_char_list() |> :http_uri.parse() do
    {:ok, {scheme, user_info, host, port, path, query}} -> # check the host, extract id from query etc
    {:error, msg} -> # incorrect url - handle error
 end

NobbZ · July 22, 2017, 9:12am

In elixir we have the URI module, it has a parse/1 function as well.

yurko · July 22, 2017, 9:14am

It’s not quite the same:

Note this function expects a well-formed URI and does not perform any validation.

Erlangs function does perform validation and returns helpful messages, useful if these urls come from external source or user input.

wmnnd · July 22, 2017, 9:48am

You can’t expect YouTube URLs to have the id at the end since they can also take additional parameters.

NobbZ · July 22, 2017, 9:51am

Then an RegEx will be totally unreadable and absolutely not easy to get it right as well. Proper parsing of the URL is the only option then.

wmnnd · July 22, 2017, 9:53am

This is not a valid YouTube URL, by the way: http://www.youtube.com/?v=dQw4w9WgXckQ

NobbZ · July 22, 2017, 10:13am

But its one of the OPs examples, so he seems to allow it as a valid one… He must have got it from somewhere to list it up.

nishanthg92 · July 24, 2017, 6:58am

Can any one get me regEx to replace https://youtu.be/64XLNqIlY00 to https://www.youtube.com/watch?v=64XLNqIlY00??

NobbZ · July 24, 2017, 7:18am

You really do not wan’t to do this using regular expressions, it will break earlier or later. Please use a proper URI-parser and extract all necessary information from there.

But to extract the ID from the first link, you can use ~r"https?://youtu.be/(?<id>.*)". I do still leave it as an exercise to append it to any string you like.

nishanthg92 · July 24, 2017, 7:43am

Can you help me out in extracting ID with any one example of youtube link?

nishanthg92 · July 24, 2017, 8:12am

I am not getting any function to extract ID from URI. So can u just give me one example??

NobbZ · July 24, 2017, 8:20am

Nope, I won’t.

As we already said earlier, regular expressions will either be error prone or utterly complex for this task. If you really want to use regular expressions, then build them for yourself from the ground up. I already gave you an example of how to do it. All the other URLs fit in the same pattern, if the quick and dirty solution is good enough for you.

There will be so many valid youtube links, you won’t cover with simple regular expressions… Any valid youtube.com URL will possible be valid with any other TLD as well.

The probably most accurate way, to really be sure that a given URL is not only generically valid, but also is a video and to get its embed link, is to fetch the URL and read the HTMLs meta-information.

Currently every valid video page will have a meta-field with the name twitter:player and the embed link as content.

edit

Sorry, I got your question wrong. I understood that you wanted to get regular expressions for the other examples you gave, but actually you wan’t to see an example of how to use the regular expression I gave you.

So here a very generic example of how to extract something from a string using an regular expression and named matches:

iex> Regex.named_captures(~r/c(?<foo>d)/, "abcd")
%{"foo" => "d"}

iex> Regex.named_captures(~r/a(?<foo>b)c(?<bar>d)/, "abcd")
%{"bar" => "d", "foo" => "b"}

iex> Regex.named_captures(~r/a(?<foo>b)c(?<bar>d)/, "efgh")
nil

Those are the examples from Regex.named_captures/3.

nishanthg92 · July 24, 2017, 9:13am

Thank you