Earmark - Parsing HTML inside code blocks

Using Earmark, I’m trying to parse HTML inside markdown code blocks so that they can be syntax-highlighted. I’m having a bit of a rough time with it and, admittedly, I could probably stand to spend a bit more time trying to figure it out myself but I feel I’m maybe on the wrong path as it stands so I thought it couldn’t hurt to ask—also, whenever I do ask for help, I usually figure it out minutes later :sweat_smile:

So, I think understand that the following doesn’t work—the 3rd (content) element of the 4-tuple is usually ignored:

def parse() do
  """
  ```sql
  <span class="k">SELECT</span> * <span class="k">FROM</span> table
  ```
  """
  |> Earmark.as_ast!()
  |> Earmark.map_ast(&parse_html/1)
  |> Earmark.transform()
end

defp parse_html({"code", [{"class", "sql"}], [html], meta} = node, true) do
  {:ok, html} = Floki.parse_fragment(html)

  {:replace, {"code", [{"class", "sql"}], [html], meta}}
end

I say usually since the documentation doesn’t explicitly mention that the content node is ignored when using {:replace, node} yet that seems to be the case (I’m actually not sure .

I there a way to simply replace the content node?

Is there just a better way of doing this (without using highlightjs)?

I’m looking at pre- and post-processors but currently not having much luck.

Thanks for reading!

After sleeping on it, I figured it out. It was right there in the documentation, it just didn’t click with me that that is what I needed.

Using map_with_ast we can use the accumulator to conditionally match on a specific text node. The part that the accumulator was used to match in this was is that part that flew over my head when first reading it.

So I’ve ended up with this:

"""
```sql
<span class="k">SELECT</span> * <span class="k">FROM</span> table
```
"""
markdown
|> Earmark.as_ast!()
|> Earmark.Transform.map_ast_with(false, fn
  {"code", [{"class", "sql"}], _, meta}, _ ->
    {{"code", [{"class", "sql"}], nil, meta}, true}

  html, true ->
    {:ok, html} = Floki.parse_fragment(html)

    html =
      Floki.traverse_and_update(html, fn
        {tag, args, children} -> {tag, args, children, %{}}
      end)

    {html, false}

  node, _ ->
    {node, false}
end)
|> Earmark.transform(options())

Which works! The Floki.traverse_and_update/2 call is necessary to convert from Floki’s tuple representation to Earmark’s.

The only thing left is that the spans are put on their own lines which is causing the formatting to be all wonky, though that is expected and will have to figure something else out there.

2 Likes

Great you found it, was just about to try it out …

Thank you for the PR too, very much appreciated

1 Like

Maybe you want to try the option compact_output: true (oh no you are talking about the spans coming from Floki, right?)

Hey @RobertDober, thanks for the reply and thanks for all your work on Earmark—that is very much appreciated!

Yes, compact_output: true did not help since, as you likely well know, spans are not @compact_tags. I was able to fix it by converting the spans to ems (which I don’t mind at all since semantically they are emphasized although I’m doing it programmatically since I want to eventually integrate with makeup or vim’s :TOhtml) but then I was faced with the problem that if two tags are in a row they render without spaces. e.g.: <em class="k">SELECT</em> <em class="k">FROM</em> they render as SELECTFROM. I was actually able to solve this but in a very convoluted way:

    {result, _} =
      Earmark.Transform.map_ast_with(result, nil, fn
        {"em", args, _, meta}, nil ->
          {{"em", args, nil, meta}, :em_first}

        {"em", args, _, meta}, :em_next ->
          {{"em", args, nil, meta}, :em_text}

        {tag, args, _, meta}, _ ->
          {{tag, args, nil, meta}, nil}

        text, :em_first ->
          {text, :em_next}

        text, :em_text ->
          {" #{text}", :em_next}

        node, _ ->
          {node, nil}
      end)

So basically saying "If we see an em for the first time, mark it as such (:em_first), then when we see its text node, do nothing other than that mark it to look out for another em (:em_next). If the next node is indeed an em, mark it that it’s part of a string of ems (still :em_next) and leftPad™ it. Anything else, just reset.

It’s a little convoluted and I haven’t revisited it since I got it working.

For all intents and purposes this solves my problems, but having a another little issue that I was going to open in the repo since it seems more appropriate to discuss there (and I want to look at the source a little more to understand if it’s reasonable or not).

1 Like

I’m realizing I forgot to mention the :em_text in there but I think/hope you get the (convoluted) picture.

well that looks like a bug to me, I’ll investigate and open an issue either in Earmark or EarmarkParser

1 Like

as mentioned on github if this is a bug it is on Floki