sodapopcan
Earmark - Parsing HTML inside code blocks
Using Earmark, I’m trying to parse HTML inside markdown code blocks so that they can be syntax-highlighted. I’m having a bit of a rough time with it and, admittedly, I could probably stand to spend a bit more time trying to figure it out myself but I feel I’m maybe on the wrong path as it stands so I thought it couldn’t hurt to ask—also, whenever I do ask for help, I usually figure it out minutes later ![]()
So, I think understand that the following doesn’t work—the 3rd (content) element of the 4-tuple is usually ignored:
def parse() do
"""
```sql
<span class="k">SELECT</span> * <span class="k">FROM</span> table
```
"""
|> Earmark.as_ast!()
|> Earmark.map_ast(&parse_html/1)
|> Earmark.transform()
end
defp parse_html({"code", [{"class", "sql"}], [html], meta} = node, true) do
{:ok, html} = Floki.parse_fragment(html)
{:replace, {"code", [{"class", "sql"}], [html], meta}}
end
I say usually since the documentation doesn’t explicitly mention that the content node is ignored when using {:replace, node} yet that seems to be the case (I’m actually not sure .
I there a way to simply replace the content node?
Is there just a better way of doing this (without using highlightjs)?
I’m looking at pre- and post-processors but currently not having much luck.
Thanks for reading!
Most Liked Responses
sodapopcan
After sleeping on it, I figured it out. It was right there in the documentation, it just didn’t click with me that that is what I needed.
Using map_with_ast we can use the accumulator to conditionally match on a specific text node. The part that the accumulator was used to match in this was is that part that flew over my head when first reading it.
So I’ve ended up with this:
"""
```sql
<span class="k">SELECT</span> * <span class="k">FROM</span> table
```
"""
markdown
|> Earmark.as_ast!()
|> Earmark.Transform.map_ast_with(false, fn
{"code", [{"class", "sql"}], _, meta}, _ ->
{{"code", [{"class", "sql"}], nil, meta}, true}
html, true ->
{:ok, html} = Floki.parse_fragment(html)
html =
Floki.traverse_and_update(html, fn
{tag, args, children} -> {tag, args, children, %{}}
end)
{html, false}
node, _ ->
{node, false}
end)
|> Earmark.transform(options())
Which works! The Floki.traverse_and_update/2 call is necessary to convert from Floki’s tuple representation to Earmark’s.
The only thing left is that the spans are put on their own lines which is causing the formatting to be all wonky, though that is expected and will have to figure something else out there.
RobertDober
Great you found it, was just about to try it out …
Thank you for the PR too, very much appreciated
sodapopcan
Hey @RobertDober, thanks for the reply and thanks for all your work on Earmark—that is very much appreciated!
Yes, compact_output: true did not help since, as you likely well know, spans are not @compact_tags. I was able to fix it by converting the spans to ems (which I don’t mind at all since semantically they are emphasized although I’m doing it programmatically since I want to eventually integrate with makeup or vim’s :TOhtml) but then I was faced with the problem that if two tags are in a row they render without spaces. e.g.: <em class="k">SELECT</em> <em class="k">FROM</em> they render as SELECTFROM. I was actually able to solve this but in a very convoluted way:
{result, _} =
Earmark.Transform.map_ast_with(result, nil, fn
{"em", args, _, meta}, nil ->
{{"em", args, nil, meta}, :em_first}
{"em", args, _, meta}, :em_next ->
{{"em", args, nil, meta}, :em_text}
{tag, args, _, meta}, _ ->
{{tag, args, nil, meta}, nil}
text, :em_first ->
{text, :em_next}
text, :em_text ->
{" #{text}", :em_next}
node, _ ->
{node, nil}
end)
So basically saying "If we see an em for the first time, mark it as such (:em_first), then when we see its text node, do nothing other than that mark it to look out for another em (:em_next). If the next node is indeed an em, mark it that it’s part of a string of ems (still :em_next) and leftPad™ it. Anything else, just reset.
It’s a little convoluted and I haven’t revisited it since I got it working.
For all intents and purposes this solves my problems, but having a another little issue that I was going to open in the repo since it seems more appropriate to discuss there (and I want to look at the source a little more to understand if it’s reasonable or not).








