MDEx - Fast and Extensible Markdown

leandrocp · April 18, 2025, 7:33pm

MDEx is a fast and extensible Markdown parser and formatter.

Fast

Leverage Rust to parse, manipulate and render documents using:

comrak - a Rust port of the official GitHub markdown library compliant with the CommonMark spec.
ammonia - HTML sanitization.
autumnus - syntax highlighter powered by Tree-sitter and Neovim themes.

Extensible

A Req-like API to manipulate documents, for eg mdex_mermaid

Features

Convert between formats: Markdown (CommonMark), HTML, JSON, XML
GitHub Flavored Markdown
Discord and GitLab features
Wiki-style links
Sigils for Markdown, HTML, JSON, and XML

More on docs and github:

Eiji · April 18, 2025, 10:40pm

Very interesting package with lots of features that took my attention! Here are just a few questions/suggestions:

It would be nice to see a comparison table with existing solutions like Earmark - short summary side-by-side sometimes does more than thousands of words
Would it be hard to add support for a custom markdown rules or even custom sets of rules (other markdown syntax - if any)? For example see how many interesting features adds ex_doc to the markdown. It’s import to answer if using your library developer could do the same thing. Would it be harder or simpler?
Would it be hard to add support for Earmark (and other packages)? Imagine a case where there is a critical bug in the Rust parser - for Elixir app it means stopping entire app.

If people would prefer your solution then sooner or later they may ask such a questions … Maybe this is not the same kind of feedback you are interested, but since writing parser is not trivial and especially because Earmark is in the ecosystem for a many, many years people working in production may ask you such type of questions.

Instead of forcing something new (i.e. something unknown) it’s always better to provide something that can fallback in the worst case. This often helps in migrating a big part of a project.

leandrocp · April 19, 2025, 5:22am

Very interesting package with lots of features that took my attention!

Thanks!

It would be nice to see a comparison table.

Done. Here’s a comparison table and a livebook to compare the output of some markdown libraries.

Would it be hard to add support for a custom markdown rules or even custom sets of rules

Hard to tell if it’s easier or harder, it depends on your needs, but I’d argue it tends to be easier. For instance here’s the code to render Mermaid graphs and here are some examples - you’ll notice it’s all about transforming a tree of nodes. MDEx use structs for a couple of reasons but it’s not much different than earmark_parser ast or even floki ast.

For example see how many interesting features adds ex_doc to the markdown.

Many features on ex_doc are actually implemented on the HTML data structure/AST either using external libraries to perform operations like syntax highlight or emoji (both native to MDEx) or to autolink module/function. So technically any other Markdown library could implement ExDoc.Markdown and all the rest would just work, although I’ve never tried it

Would it be hard to add support for Earmark (and other packages)? Imagine a case where there is a critical bug in the Rust parser - for Elixir app it means stopping entire app.

I’m not sure I follow this question. Do you mean as a backend for MDEx? If so, then no because Earmark doesn’t fully support CommonMark which was a requirement for MDEx in the first place. And comrak is pretty stable, the author is very responsive (even contributed to MDEx already) and it’s used by many libraries in the Rust ecosystem including for example to build the Deno documentation.

writing parser is not trivial

Nope it’s not. Markdown is way more complex than it looks but all the credit on the parser goes to comrak and cmark-gfm.

Maybe this is not the same kind of feedback you are interested

That was super useful, really good topics. Let me know if something is not clear.

Eiji · April 19, 2025, 7:06am

Oh, I see … However you need to agree that’s not an efficient solution and could be done much better

For example if we could “just register” syntax like =={md}== and transform it in a custom struct then it would be much more easier to write. Except the developer experience there could be also a big performance impact as there would be no need to traverse the whole tree for each tiny plugin.

Yes, that’s what I asked about. However what I was thinking for was a bit different …

iex> MyLib.parse(input, backend: Earmark, extensions: […], standard: MyLib.CommonMark)
{:error, %MyLib.NotSupportedError{message: "backend Earmark does not support MyLib.CommonMark standard, the supported standards are: …"}

This gives lots of flexibility:

Separate extensions and backend - extensions should just register extra markdown rules while backend is supposed to parse predefined rules
Each backend may support a different standard and may not support extensions - having a single and consistent API to work with any parser would be a killer feature - even if one backend would have much more features than others
It’s not about what markdown standard you support, but what markdown parser you support. As above returning the error message is not a problem. Developers previously using Earmark may be fully aware of it’s limitations. However it’s easier to change dependency with “old” backend and later test how much cool is a “new” backend than completely abandon the old parser in favour of the new one.

Keep in mind I was not talking about supporting every possible markdown feature in each parser, but about baby steps as such strategy should convince even the most “conservative” teams to change drasticaly affecting the adoption.

While it’s of course not some kind of expectations from your project, it’s more like an amazing practice. Please pay attention that I’m talking about a common pattern, for example the Phoenix have support for a custom backend in the Endpoint configuration as well as json_library configuration that supports all JSON libraries that are available (no matter how efficient or old they are).

About extensions mentioned … I believe that for the best developer experience such API would be amazing:

defmodule MyApp.MarkdownExtensions.Equals do
  @moduledoc "…"

  @behaviour MyLib.MarkdownExtenstion

  @typedoc "…"
  @type t : %__MODULE__{content: MyLib.MarkdownTree.t()}
  defstruct [:content]

  @impl true
  def init, do: MyLib.register_extension("=={md}==", md: :inline_markdown)

  @impl true
  def new(opts), do: %__MODULE__{content: opts[:md]}
end

As said this is much easier and efficient than traversing the tree, but I understand that’s easier to say than do.

NobbZ · April 19, 2025, 7:47am

Funny, about 2 hours before you posted here, I found mdex via a web search, as I was searching for an MDX-like implementation in elixir. Sadly you don’t seem to do that.

Still I will take a closer look after the weekend, it might be useful for another project of mine.

leandrocp · April 19, 2025, 4:11pm

Hi @Eiji

not an efficient solution

I’m assuming you’re talking about performance right? In that regard there will always be trade-offs. Traversing a tree in Elixir is pretty performance-efficient, used everywhere in many projects like LiveView and others
But if that’s still not enough and you need max performance, you still have the option to contribute upstream, for example recently GitHub alerts/admonitions were introduced upstream so MDEx gets that for free. Besides that, comrak is becoming more extensible in the most recent versions so eventually MDEx can leverage that as well. But ultimately the API has to be on the Elixir side so there are some limitations on what it can really use.

“just register” syntax like =={md}==

I guess I’m failing to understand what you mean. Even if you’re able to register an extension, you still need to parse =={md}== and render it somehow. You still have to implement it anyway, so I don’t see how that is “much easier and efficient than traversing the tree”.
But I agree that manipulation of the document is important to be as easy as possible, that’s why I’ve been working on high-level functions like put_node_in_document_root/3.

extensions / backend

It’s flexible but also brings a lot of complexity making such design not viable and actually I think the perspective is inverted. For example you mentioned Phoenix (Plug) and previously ex_doc, in both cases it’s the consumer (Plug and ex_doc) that defines what to expect from the json and markdown adapters, respectivelly.

Please correct me if I’m not getting your question correctly tho.

leandrocp · April 19, 2025, 4:13pm

MDX was actually an inspiration to create MDEx

Now that the extension API is done I’m gonna work on EEx and HEEx to support Elixir and Components officially, but most likely in a separated library.

What exactly were you looking for that MDX does and you’d like to see in MDEx?

Eiji · April 20, 2025, 1:28am

Oh, simply you register said extension, so instead of traversing it’s done on the parser level. Currently each parser uses some rules to return data in desired format once and then we traverse the returned data in each extension. The parser is anyway following the common patterns like **text**, right? Think that you are able to point the parser what syntax it has to support. Yeah, that would require a lot of work on the parser level, so it’s not an ideal solution. However if it would be done it would make your extremely flexible.

Previously I referenced 2 different naming like standard and extension. In very short for the parser standard is just a set of extensions. What’s the difference then? Instead of inventing my own markdown syntax I want to instruct parser:

Do it your way, but as same as you support **text** please extend your work too support ==text== as well.

While in standard case it’s:

Forgot about other rules and follow only my set of rules.

Yes, exactly I mean this. Of course if you consider to do so you go with your current choice as a default behaviour and then let others optionally change parser or markdown standard (basically set of rule i.e. parser extensions).

What you do is to support a lot of options in your functions. However this does not covers well with typical cases. All people care about is:

MyLib.parse(input, standard: MyLib.Markdown.GitHub, extensions: [AdditionalDoubleEqualsSyntax])

This line is especially important in Elixir ecosystem where we do all we can to follow 10x less LOC rule.

That’s said, for all the time I did not say anything bad about any of the library features and my (literally) “5 cents” are about the preferred API syntax (developer experience).

Also when writing reply I have reminded one more thing from the packages I used in past. For example in nimble_parsec you either follow the RFC implementation or define your own parser:

# nimble_csv version:
# NimbleCSV.define(MyParser, separator: "\t", escape: "\"")

# a MyLib module-based solution for rule DSL
defmodule MyApp.Markdown do
  # many `rule` DSL generated here
  use MyLib.Standard.CommonMark

  rule :some_name, "=={md}==" when md: :inline_markdown

  # some kind of "old way" for parsers not supporting custom rules, for example:
  traverse do
    %MDEx.Code{literal: "elixir"} = node -> %{node | literal: "ex"}
  end

  # for sure we may need extra DSL for various formats
  # however we work on already traversd (by parser) data
  # format :html, :some_name, opts do
  #   "HTML code goes here #{ops[:md]} …"
  # end
end

# no options here
MyApp.Markdown.parse("> markdown **input**")

Of course CSV files are much more simpler than markdown and the above code is basically art for art, but what can I say? It looks beautiful!

Does it makes more sense?

NobbZ · April 21, 2025, 3:36pm

Components.

leandrocp · May 22, 2025, 12:04am

MDEx v0.7.0 is out with a new ~MD sigils supporting assigns and Elixir expressions:

import MDEx.Sigil

assigns = %{lang: "elixir", sample: "spawn(fn -> send(current, {self(), 1 + 2}) end)"}

~MD"""
## Lang: <%= String.capitalize(@lang) %>

```elixir
<%= @sample %>
```
"""

Outputs:

%MDEx.Document{
  nodes: [
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Lang: Elixir"}], level: 2, setext: false},
    %MDEx.CodeBlock{
      nodes: [],
      fenced: true,
      fence_char: "`",
      fence_length: 3,
      fence_offset: 0,
      info: "elixir",
      literal: "spawn(fn -> send(current, {self(), 1 + 2}) end)\n"
    }
  ]
}

Or using the HTML modifier:

"<h2>Lang: Elixir</h2>\n<pre class=\"athl\" style=\"color: #abb2bf; background-color: #282c34;\"><code class=\"language-elixir\" translate=\"no\" tabindex=\"0\"><span class=\"line\" data-line=\"1\"><span style=\"color: #61afef;\">spawn</span><span style=\"color: #848b98;\">(</span><span style=\"color: #c678dd;\">fn</span> <span style=\"color: #abb2bf;\">-&gt;</span> <span style=\"color: #61afef;\">send</span><span style=\"color: #848b98;\">(</span><span style=\"color: #abb2bf;\">current</span><span style=\"color: #848b98;\">,</span> <span style=\"color: #848b98;\">&lbrace;</span><span style=\"color: #61afef;\">self</span><span style=\"color: #848b98;\">(</span><span style=\"color: #848b98;\">)</span><span style=\"color: #848b98;\">,</span> <span style=\"color: #d19a66;\">1</span> <span style=\"color: #abb2bf;\">+</span> <span style=\"color: #d19a66;\">2</span><span style=\"color: #848b98;\">&rbrace;</span><span style=\"color: #848b98;\">)</span> <span style=\"color: #c678dd;\">end</span><span style=\"color: #848b98;\">)</span>\n</span></code></pre>"

That’s one step into the direction of supporting Markdown in LiveViews with components.

greven · May 22, 2025, 9:08am

Great feature. Thanks @leandrocp! Your library is really top notch. Using it to build my website blog parts and it’s been a pleasure to use.

I was thinking in maybe add custom syntax highlighting themes, is that possible? Thanks again.

leandrocp · May 23, 2025, 8:31pm

Thanks! Yep you can create either provide your own %Autumn.Theme{} or modify existing ones, for example:

%{highlights: highlights} = theme = Autumn.Theme.get("github_dark")
red_text_style = %Autumn.Theme.Style{fg: "red"}
highlights = Map.put(highlights, "string", red_text_style)
theme = Map.put(theme, :highlights, highlights)

Then format using that theme:

MDEx.to_html!("""
```elixir
String.upcase("elixir")
```
""",
syntax_highlight: [formatter: {:html_inline, theme: theme}])

You’ll see <span style="color: red;">"elixir"</span> in the output.

I should probably implement the Access behaviour to let you just call put_in(theme, [...], value)

EDIT: Autumn is another library, you can find docs at Autumn — Autumn v0.3.3

greven · May 23, 2025, 10:40pm

Hey Leandro!

I saw that Autumn was Rust based so didn’t get that it was written by you, too!

Thanks for the recipe! I see that there is an issue regarding custom themes so didn’t think I it was possible. Obrigado!

leandrocp · May 23, 2025, 11:36pm

That Rust crate is required to call all tree-sitter bindings. Autumn is a “client” of that crate and finally MDEx makes use of both since it requires both Elixir and Rust. I know it’s a bit confusing but it was the best way I found to maintain all pieces together. De nada!