Autolinking urls in string (replacing urls with link tags)

I’m having a bit of trouble wrapping my head around how to implement an autolinking feature in my Phoenix application. Something like rails_autolink from rails (https://github.com/tenderlove/rails_autolink)

The idea is that I want to scan some user-content and replace all urls with link-tags.

Now, I could technically just make a regular expression and just use it with String.replace to replace the urls, but since I will be using this in combination with text_to_html to also split the text into paragraphs it becomes a little more complex, since text_to_html returns a {:safe, string} tuple. Furthermore when I start replacing the urls by using the Phoenix.HTML.Link.link it becomes a bit of a mess, since I end up with something like:

["<p>This is a test:\r<br>\n", {:safe, [60, "a", [[32, "class", 61, 34, "embed", 34], [32, "href", 61, 34, "https://www.youtube.com/watch?v=CmAI_MwdASw", 34]], 62, "https://www.youtube.com/watch?v=CmAI_MwdASw", 60, 47, "a", 62]} | "</p>\n"]

Which Phoenix seems to choke on:

ArgumentError at GET /forums/4/forum_topics/16
argument error
nofile
No code available.
:erlang.iolist_to_binary/1
Called with 1 arguments
["<p>This is a test:\r<br>\n", {:safe, [60, "a", [[32, "class", 61, 34, "embed", 34], [32, "href", 61, 34, "https://www.youtube.com/watch?v=CmAI_MwdASw", 34]], 62, "https://www.youtube.com/watch?v=CmAI_MwdASw", 60, 47, "a", 62]} | "</p>\n"]

I think I’m missing something here. What would be the “correct” way to accomplish this?

1 Like

I’m working on this as well, if I figure it out and remember to come back, I’ll add my solution here. Or if you figured it out, do update please.

1 Like

Have you tried using Phoenix.HTML.safe_to_string/1 on the result of text_to_html?

1 Like

For anyone who arrives here, this is what I found to work. Not perfect, but a starting point.

  # only accepts https
  @url_regex ~r/(https:\/\/[^\s<]+)/i

  def linkify(text) when is_binary(text) do
    text
    # Escape ALL user content
    |> html_escape()
    # Get the escaped HTML string
    |> safe_to_string()
    # Inject safe <a> tags
    |> replace_urls()
    # Mark final output as safe
    |> raw()
  end

  defp replace_urls(escaped_html) do
    Regex.replace(@url_regex, escaped_html, fn url ->
      ~s(<a href="#{url}" target="_blank" rel="noopener noreferrer">#{url}</a>)
    end)
  end

And then in your HEEX:

<%= linkify(@content_to_linkify) %>

If anyone knows of a better way, I am open to improvements and suggestions!

1 Like

I will probably experiment with these

If you use a different delimiter for the sigil you don’t need to escape the slashes. I’m currently using the following regex for this:

~r"(?:^|\s)(https?://[^\s+])"

Also, going forward we need to avoid putting regex literals in module attributes. You can use a function instead (or just inline it):

def url_regex, do: ~r"..."

Finally, instead of working with raw text you can just parse the input into a simple AST and then render it 100% safely in heex:

# AST:
[
  {:link, "https..."},
  {:text, " foo bar\n"},
  # ...
]

# HEEx:
<%= for block <- @ast do %>
  <%= case @block do %>
    <% {:text, text} -> %><span>{text}</span>
    <% {:link, url} -> %><a href={url}>{url}</a>
  <% end %>
<% end %>

I do this, and it works fine :slight_smile:

1 Like

Why would your wrap all your text in <span>? Having an isolated a tag next to regular text inside a p tag as an example is valid html.

1 Like

You’re quite right (it’s mostly force of habit), however I actually do need a span there in one case because I want to preserve whitespace and doing so on the containing element would also preserve the whitespace of the template. So instead I do this:

.container > * { white-space: pre-wrap; }

And for that I need the span. I could inline the entire thing into one line instead, but, you know, ew.