Nice way to check if iodata is an HTML link?

In a Phoenix template we are trying to check if a Phoenix.HTML.Safe contains iodata that represents an HTML link. Based on whether it’s a link, we want to add a certain class to an element.

We want to check / pattern match on the given input and return true when all the following conditions are met:

  • The input is an “a” tag.
  • The input is only one node / tag.
  • The input can have surrounding whitespace.

Now, to check if the given input is a link, we had the following code:

    defp link?({:safe, [60, "a" | _]}), do: true
    defp link?({:safe, [" " <> _, [60, "a" | _] | _]}), do: true
    defp link?(_), do: false

Example what it should do:

{:safe,
  [
    "         ",
    [60, "a", [[32, "href", 61, 34, "/", 34]], 62, "this is a link", 60, 47, "a", 62],
    "         "
  ]
} 
|> link?()
true

That seemed dirty so we figured it might be better to convert the input by using safe_to_string/1, escape it, and then pattern match or regex on the result.

That still feels tricky, so we now decided to use Floki to do the job, as it looks like a good way to go. However, we are curious as to the alternatives. So, are there any other (maybe simpler) options to achieve this?

1 Like

I am not aware of any better way than changing iolist() to binary and matching it then.

Floki is pretty good but it really depends on what kind of HTML input you are dealing with. Meeseeks works better with malformed HTML soups but it has a Rust compiler dependency (which turned out to be a non-issue even on Windows; Rust’s tooling is really nimble).

I’d say though, sanitize your HTML input heavily + use Floki. Do not concern yourself with parsing performance unless you have to parse megabytes of HTML every few seconds.

Thanks @hauleth and @dimitarvp, we will go with parsing the HTML and checking with Floki.

The HTML we are trying to check is passed in our templates; it will be either a string or a :safe, and it shouldn’t be too large (at most something like 5 nodes), so I guess we’re good.

It would be much easier and faster to implement your own link function which adds the class. This wouldn’t catch hard-coded anchor tags, but if it ticks all the requirement boxes you’re looking at a few minutes of work.

3 Likes

That would be a nice solution as well. We don’t have the problem of hard-coded anchor tags and we could implement this link function in the scope where it’s needed. Just one catch there: whenever the input would consist of multiple links, the class should not be set. Also it should be set to a parent element, not the link itself.

I’ll try to give this a bit more context:

We’ve got cells, which we use to create coupled modules of CSS, JS and Views in Phoenix (ex_cell). Now we are using this for a tooltip cell that’s useable anywhere in our application. A cell has options, and one of the options for the tooltip cell is the content of the ‘trigger’, which could be anything the developer wants it to be: a link, a string, two links, a label, etc.

Whenever it’s one link in that trigger content, we need a class on the parent. We could also add another ‘class’ option to the tooltip cell, and then it would be easily solved with adding the right class in combination with the trigger content of a link. But I want to make this tooltip cell as easy as possible for other developers to use, so they won’t have to bother adding an extra class. So that’s why I’m trying to pattern match here. For convenience and to prevent mistakes.

Parsing out the iodata/string definitely seems like the wrong step to do this on to me. Why not just have your template functions return out a tuple structure or so encoding your data then just run a final pass over it before encoding it into eex template format? Personally I have a module that defines just about everything used in my templates from various containers to lots of link types to form handlers and a whole lot more. A lot of building up in that is very just tuple structures and it’s been very easy to use and change. :slight_smile:

2 Likes