Implementing custom Markdown parser with the MD Library

I am trying to use this library to parse my custom markdown where I want to find calls to HEEX and then handle them but cannot figure out how to make the custom parser to be invoked.

The markdown:

## TEST

Some text before a card.

<.card image_path="/images/awesome.svg">Some nice card with an image on the left.</.card>

Continuing after the card.

My parser:

defmodule MasWeb.MdParser do

  use Md.Parser

  alias Md.Parser.Syntax.Void

  @default_syntax Map.put(Void.syntax(), :settings, Void.settings())
  @syntax @default_syntax

  @impl true
  def parse(input, state) do
    # copied from the Md.Parser source code:
    %State{ast: ast, path: []} = state = do_parse(input, state)
    {"", %State{state | ast: Enum.reverse(ast)}}
  end
end

The docs for Md.Parser say this:

Custom parsers might be used in syntax declaration when the generic functionality
is not enough.

Letā€™s consider one needs a specific handling of links with titles.

The generic engine does not support it, so one would need to implement a custom parser
and instruct Md.Parser to use it with:

# config/prod.exs

config :md, syntax: %{
  custom: %{
    {"![", MyApp.Parsers.Img},
    ...
  }
}

Once the original parser would meet the "![" binary, itā€™d call MyApp.Parsers.Img.parse/2.
The latter must proceed until the tag is closed and return the remainder and the updated state
as a tuple.

Adding the configuration to config/prod.exs doesnā€™t seem to make sense to me, thus I added it to config.exs :

config :md, syntax: %{
  custom: [
    {"<.", MasWeb.MdParser},
  ]
}

But then I get this error:

Erlang/OTP 25 [erts-13.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

ERROR! the application :md has a different value set for key :syntax during runtime compared to compile time. Since this application environment entry was marked as compile time, this difference can lead to different behaviour than expected:

  * Compile time value was not set
  * Runtime value was set to: %{custom: [{"<.", MasWeb.MdParser}]}

To fix this error, you might:

  * Make the runtime value match the compile time one

  * Recompile your project. If the misconfigured application is a dependency, you may need to run "mix deps.compile md --force"

  * Alternatively, you can disable this check. If you are using releases, you can set :validate_compile_env to false in your release configuration. If you are using Mix to start your system, you can pass the --no-validate-compile-env flag



10:57:41.583 [error] Task #PID<0.252.0> started from #PID<0.107.0> terminating
** (stop) "aborting boot"
    (elixir 1.14.3) Config.Provider.boot/2
Function: &:erlang.apply/2
    Args: [#Function<1.104735216/1 in Mix.Tasks.Compile.All.load_apps/3>, [md: "/home/developer/workspace/_build/dev/lib"]]
** (EXIT from #PID<0.107.0>) an exception was raised:
    ** (ErlangError) Erlang error: "aborting boot"
        (elixir 1.14.3) Config.Provider.boot/2

If I try to recompile it as suggested in the output:

 mix deps.compile md                                                                                                                                                                                                       1 ā†µ
==> md
Compiling 13 files (.ex)

== Compilation error in file lib/md/parser/default.ex ==
** (FunctionClauseError) no function clause matching in :erl_eval."-inside-an-interpreted-fun-"/1    
    
    The following arguments were given to :erl_eval."-inside-an-interpreted-fun-"/1:
    
        # 1
        {"<.", MasWeb.MdParser}
    
    (stdlib 4.3) :erl_eval."-inside-an-interpreted-fun-"/1
    (stdlib 4.3) erl_eval.erl:898: :erl_eval.eval_fun/8
    /home/developer/workspace/deps/md/lib/md/parser/default.ex:1: (file)
    /home/developer/workspace/deps/md/lib/md/parser/default.ex:1: (file)
    (stdlib 4.3) erl_eval.erl:748: :erl_eval.do_apply/7
    (stdlib 4.3) erl_eval.erl:136: :erl_eval.exprs/6
    /home/developer/workspace/deps/md/lib/md/parser/default.ex:1: Md.Engine.__before_compile__/1
could not compile dependency :md, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile md", update it with "mix deps.update md" or clean it with "mix deps.clean md"

I have no idea how to recover form this error, thus I commented out the MD entry from my config.exs and tried instead to configure it through my custom parser:

defmodule MasWeb.MdParser do

  use Md.Parser

  alias Md.Parser.Syntax.Void

  @default_syntax Map.put(Void.syntax(), :settings, Void.settings())
  @syntax @default_syntax |> Map.merge(%{
    # I think I am doing something wrong here
    common: {"<.", MasWeb.MdParser}, # example from Md.Parser
  })

  @impl true
  def parse(input, state) do
    IO.inspect(input, label: "MD PARSER INPUT")
    IO.inspect(state, label: "MD PARSER STATE")
    %State{ast: ast, path: []} = state = do_parse(input, state)
    {"", %State{state | ast: Enum.reverse(ast)}}
  end
end

I can now compile but I never see the IO.inspect output, thus my custom parser is not being invoked, which I kind of expected to happen, but i needed to give it a try.

Any guidance how to proceed to get a custom parser configured in my Phoenix app?

2 Likes

Guilty.

The docs of md must be updated and extended widely.

In the first place, youā€™ve found a bug in the custom parsers implementation, thanks for that. Fixed in v0.9.7.

Second, you kinda mix up custom parsers and syntax. The very correct change in config.exs would now be handled properly, the correct syntax is:

import Config

config :md, syntax: %{
  custom: [{"<.", {MasWeb.MdParser, %{}}}]
}

The custom parser is an implementation of Md.Parser behaviour, so copy-pasting from the library source wouldnā€™t help much. Instead, you are supposed to implement the whole parser. The very naiĢˆve implementation would look like this:

defmodule MasWeb.MdParser do
  alias Md.Parser.State

  @behaviour Md.Parser

  @impl true
  def parse(input, state) do
    # <.card image_path="/images/awesome.svg">
    #   Some nice card with an image on the left.
    # </.card>
    # Continuing after the card.

    [tag, rest] = String.split(input, " ", parts: 2)
    [content, rest] = String.split(rest, "</.#{tag}>", parts: 2)
    {rest, %State{state | ast: [content | state.ast]}}
  end
end

Once we use <. as a tag, <. would not be passed to the handler itself. Hence we got card in the first split and the inner tag content in the second split. We end up with rest, which must be passed back to the ā€œmainā€ parser as a continuation, and content which you should translate to AST yourself (because itā€™s a custom parser.) Here we simply return back a text node.

Sidenote: custom name is essential, common would not work, itā€™s used as a tag type.


Thanks for giving the library a try, please, donā€™t hesitate to ask if anything. Iā€™d love to make some progress with its docs and tests, but unfortunately, it serves our current needs and I am like, ok, later, then :slight_smile:

3 Likes

Wow. I didnā€™t expect this first class support :heart_eyes: :love_letter:

Tonight I will try your latest changes and then I will provide some feedback.

Afterwards I will try to add at least a simple quickstart example to your home page in the docs.

2 Likes

I was under impression that is the Elixir community standard.

5 Likes

Package published to md | Hex
(6e45712029ecf10f552c16a7836259f92a132f9ce4235c0937598714cd53f87b)

I gave a thought to your usage example and now I have a question.

md already supports tags via tag: syntax, but the support is very limited (no attributes whatsoever,) I am having plans to extend it, and thatā€™s why Iā€™d love to see a real example of how it could be used.

My proposal would be for attributes to go to resulting attributes as is, the default parser would do all the work, produce an AST, and the tag itself then be passed to the (newly introduced, optional) transformer.

Before I am to file an issue and start working on it, Iā€™d love to know what do you expect to get back as the result.

1 Like

To answer this directly: I donā€™t know yet.

What I am trying to achieve it to embed any HEEX template in a markdown document. This would allow me to reuse the same components used elsewhere in other HEEX templates,

For example:

  • cards - will contain text with or without images, buttons, etc. Text would be wirrtten in markdown.
  • forms - e.g. to subscribe to a newsletter, for a poll.
  • tabs - be able to provide content as tabs
  • tables - Allow to add nicely formatted tables

All of this HEEX components would be customisable in order to pass classes and any other html attributes, etc.

The trigger was to be able to write my website pages in markdonw and have them with text and images alternating:

This current webpage is written in markdonw and content aligned with a lot of css trickery and repetition of images declaration:

# Hack Yourself First {:.markdown-header}

![hacker_mindset](/images/svg/storyset_mobile-encryption-amico.svg){:.lg:markdown-p-img .md:markdown-p-img .markdown-p-img} One of the key challenges that mobile developers face when it comes to securing their apps and APIs is the ability to think like an attacker. This is because attackers approach mobile app and API security from a different perspective and mindset than developers. They usually combined several techniques and chain the weaknesses and vulnerabilities of the mobile apps and their APIS to succeed on their intents.
Developers are typically focused on building functionality and features that meet user requirements, while also ensuring that the app is performant and easy to use. While security is certainly an important consideration, it is often not the primary focus of developers, who may not have a deep understanding of the various security risks that their app or API may face, and if they do they may not be aware how creative hackers can be on combining such security risks to mount a successful attack.

![hacker_mindset](/images/svg/storyset_hacker-bro.svg){:.markdown-p-img .sm:hidden}In contrast, attackers are motivated by different factors, such as financial gain, political, ideological or social motives, or simply the challenge of exploiting vulnerabilities in the mobile and their APIs. They approach mobile app security and API security from a different perspective, actively seeking out weaknesses and vulnerabilities that they can exploit for their own purposes or who they work for, that can be a criminal organization, a state or just a company trying to get ahead of their competitors. 
To be able to effectively secure a mobile app or API, it is therefore important for developers to be able to think like an attacker. This requires a deep understanding of the various techniques and tools that attackers use to exploit vulnerabilities in mobile apps and APIs, as well as the ability to anticipate potential attack vectors and design security controls that can mitigate these risks.![hacker_mindset](/images/svg/storyset_hacker-bro.svg){:.lg:markdown-p-img .md:markdown-p-img .markdown-p-img .sm:show .hidden}

Writing markdonw like this becomes tedious and time consuming. Earmark is being used to parse it.

Then I also came to realise that I could render HEEX templates with Earmark, but not pass custom content to the HEEX templates, and this was when I started to look into your library.

By developing the ability to think like an attacker, mobile developers can become more proactive in identifying and addressing security risks in their apps and APIs. This can help to enhance the overall security of the app or API, and ensure that user data is protected from potential threats. 

<%= cta_newsletter(assigns) %>

That renders to this:

The HEEX template that I am currently trying to use from the markdown to include the card with text and image:

<div>
  <a href="#" class="flex flex-col items-center bg-white border border-gray-200 rounded-lg shadow md:flex-row md:max-w-xl hover:bg-gray-100 dark:border-gray-700 dark:bg-gray-800 dark:hover:bg-gray-700">
      <img class="object-cover w-full rounded-t-lg h-96 md:h-auto md:w-48 md:rounded-none md:rounded-l-lg" src={assigns[:image_path]} alt="">
      <div class="flex flex-col justify-between p-4 leading-normal">
        <p class="mb-3 font-normal text-gray-700 dark:text-white">
          <%= render_slot(assigns[:inner_block]) %> 
        </p>
      </div>
  </a>
</div>

The Card HEEX template itā€™s only a draft. I still need add variables to customise the HTML attributes.

Let me know if you have further questions.

1 Like

Thank you, that helped a lot.

Now I understand itā€™s surely not about tags per se.

The main problem to figure out would be who leads the parsing. EEx allows custom engine implementation and one might call a markdown parser from a custom EEx.Engine.handle_text/3 callback.

We can support eex/heex as is, or introduce delegate which would be like custom but instead of the implementation of Md.Parser it would rather parse it until itā€™s closed, maybe apply its own format, and then delegate the whole to the external engine.

The question who leads the process remains though. Consider <%= cta_newsletter(assigns) %> where cta_newsletter/1 returns "Subscribe to **the newsletter**" string, or <.card image=...>My _fancy_ text with **markup**</.card>.


Anyway, it needs some time to twiddle the puzzle in hands, but meanwhile, I would suggest you try the vice-versa approach with a HEEx engine ruling the process and md being called on text nodes (no idea of how to achieve that, though, HEEx does not seem to be friendly to external parsers.

1 Like

Thank you very much for the options you gave me to think about. I will tak a look to them for sure.

Yesterday night I made it work and it seems that it matches the approach you suggest here:

My code:

defmodule MasWeb.MdParser.Heex do

  alias Md.Parser.State

  @behaviour Md.Parser

  @impl true
  def parse(input, state \\ %State{})
  def parse(input, state) do
    # @TODO handle the case for a tag without attributes:
    #       * <.whatever/>
    #       * <.whatever>content</.whatever>
    [tag, rest] = String.split(input, " ", parts: 2)
    [content, rest] = String.split(rest, "</.#{tag}>", parts: 2)
    [attrs, content] = String.split(content, ">", parts: 2)

    # parses markdown inside an HEEX tag to HTML and then rebuild the HEEX tag
    # to allow for the Phoenix.LiveView.HTMLEngine to the parse it as a regular 
    # *.heex file, thus keeping all the niceties for LiveView tracking? (need to
    # double check my assumption)
    html = Md.Parser.generate(content)
    html = "<.#{tag} #{attrs}>#{html}</.#{tag}>"

    {rest, %State{state | ast: [html | state.ast]}}
  end
end

The custom Phoenix Engine:

defmodule MasWeb.PhoenixEngine do

   # @link Inspired by: https://github.com/boydm/phoenix_markdown/blob/ef7b5f76f339babec688021080a70708d9ddf1c1/lib/phoenix_markdown/engine.ex#L22

  @moduledoc """
  a single public function (compile) that Phoenix uses to compile incoming templates. You should not need to call it yourself.
  """

  @behaviour Phoenix.Template.Engine

  @doc """
  Callback implementation for `Phoenix.Template.Engine.compile/2`

  Precompiles the String file_path into a function defintion, using the EEx and Earmark engines

  The compile function is typically called for by Phoenix's html engine and isn't something
  you need to call your self.

  ### Parameters
    * `path` path to the template being compiled
    * `name` name of the template being compiled

  """
  def compile(path, _name) do

    options = [
      engine: Phoenix.LiveView.TagEngine,
      file: path,
      line: 1,
      caller: __ENV__,
      source: "",
      tag_handler: Phoenix.LiveView.HTMLEngine
    ]

    path
    |> File.read!()
    |> Md.generate(Md.Parser.Default, format: :none)
    |> EEx.compile_string(options)
  end
end

The config.exs:

config :phoenix, :template_engines,
  # will handle all markdown files that have an extension *.html.md, e,g. test.html.md
  md: MasWeb.PhoenixEngine

config :md, syntax: %{
  custom: [{"<.", {MasWeb.MdParser.Heex, %{}}}]
}

Add md extension to Phoenix live reload in config/dev.exs

config :mas, MasWeb.Endpoint,
live_reload: [
  patterns: [
    ~r"priv/static/.*(js|css|png|jpeg|jpg|gif|svg)$",
    ~r"priv/gettext/.*(po)$",
    ~r"lib/mas_web/(controllers|live|components)/.*(ex|heex|md)$"
  ]
]

The markdown file test.html.md:

## TEST

<.horizontal_left_card class="bg-transparent" image_path="/images/svg/storyset_mobile-encryption-amico.svg">
One of the key challenges that mobile developers face when it comes to securing their apps and APIs is the ability to think like an attacker. This is because attackers approach mobile app and API security from a different perspective and mindset than developers. They usually combined several techniques and chain the weaknesses and vulnerabilities of the mobile apps and their APIS to succeed on their intents.
Developers are typically focused on building functionality and features that meet user requirements, while also ensuring that the app is performant and easy to use. While security is certainly an important consideration, it is often not the primary focus of developers, who may not have a deep understanding of the various security risks that their app or API may face, and if they do they may not be aware how creative hackers can be on combining such security risks to mount a successful attack.
</.horizontal_left_card>

<.horizontal_right_card class="w-full" image_path="/images/svg/storyset_hacker-bro.svg">
In contrast, attackers are motivated by different factors, such as financial gain, political, ideological or social motives, or simply the challenge of exploiting vulnerabilities in the mobile and their APIs. They approach mobile app security and API security from a different perspective, actively seeking out weaknesses and vulnerabilities that they can exploit for their own purposes or who they work for, that can be a criminal organization, a state or just a company trying to get ahead of their competitors.
To be able to effectively secure a mobile app or API, it is therefore important for developers to be able to think like an attacker. This requires a deep understanding of the various techniques and tools that attackers use to exploit vulnerabilities in mobile apps and APIs, as well as the ability to anticipate potential attack vectors and design security controls that can mitigate these risks.
</.horizontal_right_card>

<.cta_newsletter ></.cta_newsletter>

The result:

At the moment my Phoenix 1.7 app isnā€™t using LiveView but I plan to do so, therefore I need to wait until I can be sure that this also works properly with LiveView tracking.

Would this be something you would consider to add support for in your Lib? If yes I can make the PR.

4 Likes

Sidenote: Handling a tag with and without attributes by the same code should be as easy as swapping splits

What worries me, is generating of the HTML where it should not technically be generated.

I am not sure it would work properly with nested tags, although it seems it would, still it looks like a kludge. Md.Parser supports attaching a listener, which might modify the result, and we should at least re-pass it to the underlying generate/2. Also, itā€™d fail on creating deferred links ([foo][1] followed by [1] link,) and lose the context in general, when not surrounded by \n\n.

md is a streaming parser, unlike [H]EEx, which makes it less trivial to interoperate with other engines.

Anyway, I will think about how can it be done without leaving AST representation and come back to you.

2 Likes

Iā€™m really interested in rendering function components inside markdown files.

Any advances in this front? :slight_smile:

Honestly, there is not much demand and I did not come up with a generic solution.

If you could share an example of what you need, I could give it another spin.

I think itā€™s basically what @Exadra37 has mentioned:
Iā€™d like to be able to use function components inside markdown.

This, to ease my life when writing blog posts.

Later, Iā€™d like to be able to add live LiveView examples (like using live_component) inside my posts for doing interactive tutorials and stuff.

My blog uses Phoenix + LiveView ATM, with all the content being written in markdown.

So I was planning a post on custom form inputs, and explain how to handle those in LiveView, Iā€™d be fantastic for it to be interactive instead of relying on static pictures and source code.

Do you think this sounds possible or would it require an insane effort to be realized?

If pointed in the right direction, I wouldnā€™t mind investing my own time in improving tooling for blog tech.
(Thatā€™s something thatā€™s been on my mind for some time now).

I do something like that on my blog. See GitHub - LostKobrakai/kobrakai_elixir for how itā€˜s implemented.

2 Likes

AFAIU, @Exadra37 managed to handle it by a custom parser, but I can look at how to make it native to md. There is an open issue with handling normal HTML tags with attributes, which I wanted to address, so I might also allow heex tags and pass them as is down the stream, which in theory should make it possible to apply markdown and then heex engine natively.

I am to look into it during the next week. The bad news I am very bad at web stuff and I have very limited knowledge of phoenix/liveview/heex engine, but I think I could have it managed.

Stay tuned.

2 Likes

Sorry, but I couldnā€™t find where are you using function components in your markdown files, or where are you custom parsing heex in blog.ex. :expressionless:

Hmm, maybe this was just an example of a markdown blog in phoenix and I misunderstood? :thinking:
(If thatā€™s the case, thatā€™s what I currently have on my own blog! :slight_smile:)


My bad, I have now checked directly on your live blog and can see the approach youā€™re using.

Itā€™s basically the same idea I had: using comments to split the html string, then dynamically insert components in between. Good one!

Looks like thatā€™s easy for rendering entire LiveViews.
Have you experimented with rendering live_components or function components?

I tried dynamically rendering the <.button> from core_components.ex and it didnā€™t look pretty haha.
(Well, it doesnā€™t look that bad for components without inner_content.) :v:

Well, splitting the input to conditionally feed one or another processor is not ā€œsomething like thatā€ :slight_smile: I was under the impression we are talking about interoperation, like <%= "**hello**" %> ā†’ <b>hello</b>, arenā€™t we?

1 Like

Yeah, and something like:

<Lt4Web.CoreComponents.button class="bg-red-500">
  Hello
</Lt4Web.CoreComponents.button>

To whatever HTML that component renders, e.g.:

<button class="bg-red-500">Hello</button>
2 Likes