MDEx - Fast and Extensible Markdown

MDEx is a fast and extensible Markdown parser and formatter.

Fast

Leverage Rust to parse, manipulate and render documents using:

Extensible

A Req-like API to manipulate documents, for eg mdex_mermaid

Features

  • Convert between formats: Markdown (CommonMark), HTML, JSON, XML
  • GitHub Flavored Markdown
  • Discord and GitLab features
  • Wiki-style links
  • Sigils for Markdown, HTML, JSON, and XML

More on docs and github:

16 Likes

Very interesting package with lots of features that took my attention! Here are just a few questions/suggestions:

  1. It would be nice to see a comparison table with existing solutions like Earmark - short summary side-by-side sometimes does more than thousands of words

  2. Would it be hard to add support for a custom markdown rules or even custom sets of rules (other markdown syntax - if any)? For example see how many interesting features adds ex_doc to the markdown. It’s import to answer if using your library developer could do the same thing. Would it be harder or simpler?

  3. Would it be hard to add support for Earmark (and other packages)? Imagine a case where there is a critical bug in the Rust parser - for Elixir app it means stopping entire app.

If people would prefer your solution then sooner or later they may ask such a questions … Maybe this is not the same kind of feedback you are interested, but since writing parser is not trivial and especially because Earmark is in the ecosystem for a many, many years people working in production may ask you such type of questions.

Instead of forcing something new (i.e. something unknown) it’s always better to provide something that can fallback in the worst case. This often helps in migrating a big part of a project.

Very interesting package with lots of features that took my attention!

Thanks!

  1. It would be nice to see a comparison table.

Done. Here’s a comparison table and a livebook to compare the output of some markdown libraries.

  1. Would it be hard to add support for a custom markdown rules or even custom sets of rules

Hard to tell if it’s easier or harder, it depends on your needs, but I’d argue it tends to be easier. For instance here’s the code to render Mermaid graphs and here are some examples - you’ll notice it’s all about transforming a tree of nodes. MDEx use structs for a couple of reasons but it’s not much different than earmark_parser ast or even floki ast.

For example see how many interesting features adds ex_doc to the markdown.

Many features on ex_doc are actually implemented on the HTML data structure/AST either using external libraries to perform operations like syntax highlight or emoji (both native to MDEx) or to autolink module/function. So technically any other Markdown library could implement ExDoc.Markdown and all the rest would just work, although I’ve never tried it :smile:

  1. Would it be hard to add support for Earmark (and other packages)? Imagine a case where there is a critical bug in the Rust parser - for Elixir app it means stopping entire app.

I’m not sure I follow this question. Do you mean as a backend for MDEx? If so, then no because Earmark doesn’t fully support CommonMark which was a requirement for MDEx in the first place. And comrak is pretty stable, the author is very responsive (even contributed to MDEx already) and it’s used by many libraries in the Rust ecosystem including for example to build the Deno documentation.

writing parser is not trivial

Nope it’s not. Markdown is way more complex than it looks but all the credit on the parser goes to comrak and cmark-gfm.

Maybe this is not the same kind of feedback you are interested

That was super useful, really good topics. Let me know if something is not clear.

5 Likes

Oh, I see … However you need to agree that’s not an efficient solution and could be done much better

For example if we could “just register” syntax like =={md}== and transform it in a custom struct then it would be much more easier to write. Except the developer experience there could be also a big performance impact as there would be no need to traverse the whole tree for each tiny plugin.

Yes, that’s what I asked about. However what I was thinking for was a bit different …

iex> MyLib.parse(input, backend: Earmark, extensions: […], standard: MyLib.CommonMark)
{:error, %MyLib.NotSupportedError{message: "backend Earmark does not support MyLib.CommonMark standard, the supported standards are: …"}

This gives lots of flexibility:

  1. Separate extensions and backend - extensions should just register extra markdown rules while backend is supposed to parse predefined rules

  2. Each backend may support a different standard and may not support extensions - having a single and consistent API to work with any parser would be a killer feature - even if one backend would have much more features than others

  3. It’s not about what markdown standard you support, but what markdown parser you support. As above returning the error message is not a problem. Developers previously using Earmark may be fully aware of it’s limitations. However it’s easier to change dependency with “old” backend and later test how much cool is a “new” backend than completely abandon the old parser in favour of the new one.

Keep in mind I was not talking about supporting every possible markdown feature in each parser, but about baby steps as such strategy should convince even the most “conservative” teams to change drasticaly affecting the adoption.

While it’s of course not some kind of expectations from your project, it’s more like an amazing practice. Please pay attention that I’m talking about a common pattern, for example the Phoenix have support for a custom backend in the Endpoint configuration as well as json_library configuration that supports all JSON libraries that are available (no matter how efficient or old they are).


About extensions mentioned … I believe that for the best developer experience such API would be amazing:

defmodule MyApp.MarkdownExtensions.Equals do
  @moduledoc "…"

  @behaviour MyLib.MarkdownExtenstion

  @typedoc "…"
  @type t : %__MODULE__{content: MyLib.MarkdownTree.t()}
  defstruct [:content]

  @impl true
  def init, do: MyLib.register_extension("=={md}==", md: :inline_markdown)

  @impl true
  def new(opts), do: %__MODULE__{content: opts[:md]}
end

As said this is much easier and efficient than traversing the tree, but I understand that’s easier to say than do.

Funny, about 2 hours before you posted here, I found mdex via a web search, as I was searching for an MDX-like implementation in elixir. Sadly you don’t seem to do that.

Still I will take a closer look after the weekend, it might be useful for another project of mine.

Hi @Eiji

not an efficient solution

I’m assuming you’re talking about performance right? In that regard there will always be trade-offs. Traversing a tree in Elixir is pretty performance-efficient, used everywhere in many projects like LiveView and others :slight_smile:
But if that’s still not enough and you need max performance, you still have the option to contribute upstream, for example recently GitHub alerts/admonitions were introduced upstream so MDEx gets that for free. Besides that, comrak is becoming more extensible in the most recent versions so eventually MDEx can leverage that as well. But ultimately the API has to be on the Elixir side so there are some limitations on what it can really use.

“just register” syntax like =={md}==

I guess I’m failing to understand what you mean. Even if you’re able to register an extension, you still need to parse =={md}== and render it somehow. You still have to implement it anyway, so I don’t see how that is “much easier and efficient than traversing the tree”.
But I agree that manipulation of the document is important to be as easy as possible, that’s why I’ve been working on high-level functions like put_node_in_document_root/3.

extensions / backend

It’s flexible but also brings a lot of complexity making such design not viable and actually I think the perspective is inverted. For example you mentioned Phoenix (Plug) and previously ex_doc, in both cases it’s the consumer (Plug and ex_doc) that defines what to expect from the json and markdown adapters, respectivelly.

Please correct me if I’m not getting your question correctly tho.

1 Like

MDX was actually an inspiration to create MDEx :grinning:

Now that the extension API is done I’m gonna work on EEx and HEEx to support Elixir and Components officially, but most likely in a separated library.

What exactly were you looking for that MDX does and you’d like to see in MDEx?

1 Like

Oh, simply you register said extension, so instead of traversing it’s done on the parser level. Currently each parser uses some rules to return data in desired format once and then we traverse the returned data in each extension. The parser is anyway following the common patterns like **text**, right? Think that you are able to point the parser what syntax it has to support. Yeah, that would require a lot of work on the parser level, so it’s not an ideal solution. However if it would be done it would make your extremely flexible.

Previously I referenced 2 different naming like standard and extension. In very short for the parser standard is just a set of extensions. What’s the difference then? Instead of inventing my own markdown syntax I want to instruct parser:

Do it your way, but as same as you support **text** please extend your work too support ==text== as well.

While in standard case it’s:

Forgot about other rules and follow only my set of rules.


Yes, exactly I mean this. Of course if you consider to do so you go with your current choice as a default behaviour and then let others optionally change parser or markdown standard (basically set of rule i.e. parser extensions).

What you do is to support a lot of options in your functions. However this does not covers well with typical cases. All people care about is:

MyLib.parse(input, standard: MyLib.Markdown.GitHub, extensions: [AdditionalDoubleEqualsSyntax])

This line is especially important in Elixir ecosystem where we do all we can to follow 10x less LOC rule.


That’s said, for all the time I did not say anything bad about any of the library features and my (literally) “5 cents” are about the preferred API syntax (developer experience).


Also when writing reply I have reminded one more thing from the packages I used in past. For example in nimble_parsec you either follow the RFC implementation or define your own parser:

# nimble_csv version:
# NimbleCSV.define(MyParser, separator: "\t", escape: "\"")

# a MyLib module-based solution for rule DSL
defmodule MyApp.Markdown do
  # many `rule` DSL generated here
  use MyLib.Standard.CommonMark

  rule :some_name, "=={md}==" when md: :inline_markdown

  # some kind of "old way" for parsers not supporting custom rules, for example:
  traverse do
    %MDEx.Code{literal: "elixir"} = node -> %{node | literal: "ex"}
  end

  # for sure we may need extra DSL for various formats
  # however we work on already traversd (by parser) data
  # format :html, :some_name, opts do
  #   "HTML code goes here #{ops[:md]} …"
  # end
end

# no options here
MyApp.Markdown.parse("> markdown **input**")

Of course CSV files are much more simpler than markdown and the above code is basically art for art, but what can I say? It looks beautiful! :smiling_imp:

Does it makes more sense?

Components.

1 Like