I created Table of Contents using Floki, with Header nesting. How to simplify the logic?

Hey Guys,

I created a table of contents, inspired by TailwindCSS docs.

The headers extraction logic is server side, and highlights and collapse or expand is done with JS.

Table of Contents


Here’s how I extract the headers, and nest them within their parent headers. In JavaScript it would have taken fewer lines to achieve, however since I am new to Elixir, I don’t know how to reduce the following code even further.


defp extract_headers(function_component) do
    function_component.static
    |> Floki.parse_fragment!()
    |> Floki.find("article")
    |> Enum.at(0)
    |> Floki.children()
    |> Enum.filter(&is_tuple/1)
    |> Enum.filter(fn each -> Tuple.to_list(each) |> Enum.count() == 3 end)
    |> Enum.filter(fn {name, _, _} -> name in ~w(h2 h3 h4) end)
    |> Enum.map(fn
      {header, meta, [title | _rest]} ->
        {
          header,
          meta |> Enum.find(&(elem(&1, 0) == "id")) |> elem(1),
          title |> String.replace(~r"\n\s+", "")
        }
    end)
    |> Enum.reduce([], fn
      {"h2", id, title}, acc ->
        [%{header: "h2", id: id, title: title, children: []} | acc]

      {"h3", id, title}, [head | tail] ->
        [
          %{
            head
            | children: [%{header: "h3", id: id, title: title, children: []} | head.children]
          }
          | tail
        ]

      {"h4", id, title}, [head | tail] ->
        [child_head | child_tail] = head.children

        [
          %{
            head
            | children: [
                %{
                  child_head
                  | children: [
                      %{header: "h4", id: id, title: title, children: nil} | child_head.children
                    ]
                }
                | child_tail
              ]
          }
          | tail
        ]
    end)
    |> Enum.reverse()
end

Here’s how its called:


__MODULE__
|> apply(assigns.post.body, [assigns])
|> extract_headers()

This the list before the big reduce function:


[
  {"h2", "installation", "Installation"},
  {"h3", "linux", "Linux"},
  {"h4", "ubuntu", "Ubuntu"},
  {"h4", "fedora", "Fedora"},
  {"h3", "mac", "Mac"},
  {"h2", "simple-task", "Simple Task"},
  {"h3", "foo", "Foo"},
  {"h3", "bar", "Bar"}
]

And here’s how the reduce changes the above list:


%{
    id: "installation",
    header: "h2",
    title: "Installation",
    children: [
      %{id: "mac", header: "h3", title: "Mac", children: []},
      %{
        id: "linux",
        header: "h3",
        title: "Linux",
        children: [
          %{id: "fedora", header: "h4", title: "Fedora", children: nil},
          %{id: "ubuntu", header: "h4", title: "Ubuntu", children: nil}
        ]
      }
    ]
  },
  %{
    id: "simple-task",
    header: "h2",
    title: "Simple Task",
    children: [
      %{id: "bar", header: "h3", title: "Bar", children: []},
      %{id: "foo", header: "h3", title: "Foo", children: []}
    ]
  },

Is there a better way to create nesting?

1 Like

I would need to put a little more thought into the nesting part, but for now the stuff before it could be cleaned up a bit.

|> Enum.filter(&is_tuple/1)
|> Enum.filter(fn each -> Tuple.to_list(each) |> Enum.count() == 3 end)
|> Enum.filter(fn {name, _, _} -> name in ~w(h2 h3 h4) end)

Whenever you see type checking and size checks in loops that can often be consolidated with pattern matching. The following should be equivalent:

|> Enum.filter(&match?({name, _, _} when name in ~w(h2 h3 h4), &1))

But I think for would do better here and you can even include the following map too! You’d have to break the pipe, though I think that’s a good thing in this situation. This is how I would probably write everything before the big reduce:

[menu] =
  function_component.static
  |> Floki.parse_fragment!()
  |> Floki.find("article")

menu_items = Floki.children(menu)

for {header, meta, [title | _]} <- menu_items,
    header in ~w(h2 h3 h4) do
  {
    header,
    meta |> Enum.find(&(elem(&1, 0) == "id")) |> elem(1), # I can't mentally parse what this is doing atm
    String.trim(title) # Pretty sure `trim` is what you want here
  }
end

Sorry I can’t provide help on the nesting part as I haven’t done anything like that in a bit. I’m curious, though, and hopefully someone else can. I would actually probably write something like that just because it’s simple and explicit (since there will never be more than 3 levels), I would just extract some variables so it’s not so jarring to read.

Good job on the component—it looks really nice!

3 Likes

This is beautiful, thank you. match? is great, it helped in multiple places.

I have broken the pipe already, I am using the parsed document to calculate the reading length of a post as well as the table of contents.

So I have separated parsing.

I didn’t reach out for for loop, because I wanted to do it the Elixir way. I love pipelines.

You are right, I should have tried trim, before going for Regex. And it removes extra spaces as well as new lines.

meta |> Enum.find(&(elem(&1, 0) == "id")) |> elem(1),

A header has id, maybe it will have data attribute or something else. So I just extracted id from the list of tuples.

[{"class", "something-pretty"}, {"id", "header-id"}, {"data-val", "some-val"}]
|> Enum.find(&(elem(&1, 0) == "id"))
|> elem(1)

# header-id (Output)

Based on your suggestion about match?, I changed the above logic to:

[{"class", "something-pretty"}, {"id", "header-id"}, {"data-val", "some-val"}]
|> Enum.find(&match?({attr, _val} when attr == "id", &1))
|> elem(1)

# header-id (Output)

Hope it’s more readable.


Thanks for the compliment. :upside_down_face:

1 Like

Ah yes, you can simplify further by removing the guard. I’d would also pattern matching instead of the last elem, though that’s very much up to personal preference:

{_, id} = Enum.find(&match?({"id", _}, &1)

And of course, put_in could help clean up the nesting part a bit. There is also pathex if you want to go the library route. I still can’t put much mental energy into that tonight as it’s crazy late here and haven’t been able to sleep.

2 Likes

Hehe, I was so excited, I became blind. Thanks for the tip.

I will get used to pattern matching.

Thank you for the great suggestions.

I’m looking into put_in.


put_in didn’t help much:

Before:

[
  %{
    head
    | children: [%{header: "h3", id: id, title: title, children: []} | head.children]
  }
  | tail
]

After:

[
  update_in(
    head,
    [:children],
    &[%{header: "h3", id: id, title: title, children: []} | &1]
  )
  | tail
]
1 Like

@sodapopcan,

Pathex is awesome! I was wrong about it.

This is the reducer, after Pathex treatment!

|> Enum.reduce([], fn
  {"h2", id, title}, acc ->
    [%{header: "h2", id: id, title: title, children: []} | acc]

  {"h3", id, title}, acc ->
    children = path(0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h3", id: id, title: title, children: []} | Pathex.get(acc, children)
    ])

  {"h4", id, title}, acc ->
    children = path(0 / :children / 0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h4", id: id, title: title, children: nil} | Pathex.get(acc, children)
    ])

  {"h5", id, title}, acc ->
    children = path(0 / :children / 0 / :children / 0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h5", id: id, title: title, children: nil} | Pathex.get(acc, children)
    ])
end)

I was able to add h5 as well, with ease!

image


P.S. If anyone wants the code for this Table of Contents implementation, I can paste the whole shebang, here! (The HEEx template, JS Hook and this logic.)

2 Likes

Just some food for thought: Those five operations can be expressed with something like this:

titles = 
  for {header, meta, [title | _rest]} when header in ~w(h2 h3 h4) <- children, reduce: [] do
    [head | tail] = acc -> 
       id = meta |> Enum.find(&(elem(&1, 0) == "id")) |> elem(1)
       title = title |> String.replace(~r"\n\s+", "")

      case header do
        "h2" -> [%{header: "h2", id: id, title: title, children: []} | acc]
        "h3" -> [
          %{ head | children: [%{header: "h3", id: id, title: title, children: []} | head.children] | tail
        ]
        ...
      end
  end

Not only it will shorter & easier to reason about, but also more performant.

2 Likes

I know for loops can be efficient, and that looks great as well, but I like pipelines of Elixir.

And this is the current optimized pipeline.

Besides, I don’t know how for can handle multiple function pattern matches like pipelines can! :sweat_smile:

parsed_blog
|> Floki.children()
|> Enum.filter(&match?({name, _, _} when name in ~w(h2 h3 h4 h5), &1))
|> Enum.map(fn
  {header, meta, [{"a", _, [title | _]} | _rest]} ->    # Added this later, for header links.
    {_, id} = Enum.find(meta, &match?({"id", _}, &1))
    {header, id, title |> String.trim()}

  {header, meta, [title | _rest]} ->
    {_, id} = Enum.find(meta, &match?({"id", _}, &1))
    {header, id, title |> String.trim()}
end)
|> Enum.reduce([], fn
  {"h2", id, title}, acc ->
    [%{header: "h2", id: id, title: title, children: []} | acc]

  {"h3", id, title}, acc ->
    children = path(0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h3", id: id, title: title, children: []} | Pathex.get(acc, children)
    ])

  {"h4", id, title}, acc ->
    children = path(0 / :children / 0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h4", id: id, title: title, children: []} | Pathex.get(acc, children)
    ])

  {"h5", id, title}, acc ->
    children = path(0 / :children / 0 / :children / 0 / :children)

    Pathex.set!(acc, children, [
      %{header: "h5", id: id, title: title, children: nil} | Pathex.get(acc, children)
    ])
end)
|> Enum.reverse()

I’m glad your vision came back quickly :sweat_smile:

Of course we’re all allowed to program however we want, but be careful with this! So many new to Elixir seem to become enamoured with the pipe and get addicted to it (the double entendre is too perfect). The problem with longer pipelines is that they can start to get confusing as to what is being represented at each stage to the point where someone new to the code will have to run it with a dbg to make sense of it (ie, hurting “scanability”). Reaching for functions that pull out positional values (List.first, Enum.at, elem, etc) are particularly bad for this as they are so generic and non-descriptive it’s not always immediately obvious what value they’re reducing to (it’s sort of akin to naming black hole arguments like _value instead of _). I usually use these places as an opportunity to extract an explanatory variable via pattern matching.

Just more food for thought, though, and by no means and I trying to tell you how to program! “Readability” is a subjective rabbit hole. I just have a bit of PTSD around this so I can never help myself to offer the unsolicited advice :see_no_evil:

Also, I would be happy to see the full source for component! I may need something like this soon.

2 Likes

I’m 100% represented by this statement. When revisiting older code, I find myself moving from pipes to for.

1 Like

Oh it represents me too, lol! And then I’ve worked with people who would go so far as to add extra overhead just to make pipelines work. Personally, I actually prefer functional pipelines if they are clear, but comprehensions can often read much better, so I don’t shy away from them. I’ve said this recently already, but I really feel that Elixir provides such a nice succinct set of builtin constructs that there is no reason not to get to know all of them and use each where appropriate.

3 Likes

Full Implementation

Table of contents, component:

attr :headers, :list

def table_of_contents(assigns) do
  ~H"""
  <div
    id="table-of-contents"
    class="sticky top-[calc(var(--header-height))] py-1 pr-5"
    data-file={__ENV__.file}
    data-line={__ENV__.line}
    phx-hook={Application.fetch_env!(:derpy_tools, :show_inspector?) && "SourceInspector"}
  >
    <h5
      id="toc"
      class="text-slate-900 font-semibold mb-2 text-sm leading-6 dark:text-slate-100"
      phx-hook="TableOfContents"
    >
      On this page
    </h5>
    <a
      href="#"
      class="block py-1 font-medium hover:text-slate-900 dark:text-slate-400 dark:hover:text-slate-300"
    >
      <i class="hero-chevron-up w-5.5 h-5.5 text-slate-500 dark:text-navy-100" /> Top
    </a>
     <.nested_header
      headers={@headers}
      class="max-h-[calc(100svh-(var(--header-height)))] overflow-auto"
    />
  </div>
  """
end

attr :id, :string, default: nil
attr :headers, :list
attr :class, :string, default: nil
attr :parent, :list, default: []

def nested_header(assigns) do
  ~H"""
  <ul id={@id} class={["space-y-1 font-inter font-medium list-none not-prose", @class]}>
    <li
      :for={%{header: header, id: id, title: title, children: children} <- @headers}
      class="not-prose"
    >
      <a
        id={"#{id}-link"}
        key={id}
        href={"##{id}"}
        tabindex="0"
        data-parent={@parent |> Enum.join(">")}
        class={[
          "block py-1 hover:text-slate-900 dark:text-slate-400 dark:hover:text-slate-300",
          case header do
            "h2" -> "font-semibold"
            "h3" -> "font-medium"
            "h4" -> "font-normal"
            "h5" -> "font-normal text-xs"
          end
        ]}
        phx-click={JS.toggle(to: "##{id}-container")}
      >
        <i :if={header in ~w{h3 h4 h5}} class="hero-chevron-right-mini" />
        <span><%= title %></span>
      </a>

      <.nested_header
        :if={children}
        id={"#{id}-container"}
        headers={children |> Enum.reverse()}
        class="pl-4 hidden"
        parent={[id, @parent |> Enum.join(">")]}
      />
    </li>
  </ul>
  """
end

Using the TOC component in Left Nav

Pass any function component to the parse_blog function, and it’ll work.

attr :post, :map
attr :class, :string, default: nil

def left_nav(assigns) do
  parsed_blog =
    __MODULE__
    |> apply(assigns.post.body, [assigns])
    |> parse_blog()

  assigns =
    assigns
    |> assign(
      headers: extract_headers(parsed_blog),
      reading_time: reading_time(parsed_blog)
    )

  ~H"""
  <aside class={["flex flex-col", @class]}>
    <span><%= @reading_time %></span>
    <.table_of_contents headers={@headers} />
  </aside>
  """
end

Parsing-related code in private functions:

defp parse_blog(function_component) do
  function_component.static
  |> Floki.parse_fragment!()
  |> Floki.find("article")
end

defp extract_headers([parsed_blog]) do
  parsed_blog
  |> Floki.children()
  |> Enum.filter(&match?({name, _, _} when name in ~w(h2 h3 h4 h5), &1))
  |> Enum.map(fn
    {header, meta, [{"a", _, [title | _]} | _rest]} ->
      {_, id} = Enum.find(meta, &match?({"id", _}, &1))
      {header, id, title |> String.trim()}

     {header, meta, [title | _rest]} ->
      {_, id} = Enum.find(meta, &match?({"id", _}, &1))
      {header, id, title |> String.trim()}
  end)
  |> Enum.reduce([], fn
    {"h2", id, title}, acc ->
      [%{header: "h2", id: id, title: title, children: []} | acc]

    {"h3", id, title}, acc ->
      children = path(0 / :children)

      Pathex.set!(acc, children, [
        %{header: "h3", id: id, title: title, children: []} | Pathex.get(acc, children)
      ])

    {"h4", id, title}, acc ->
      children = path(0 / :children / 0 / :children)

      Pathex.set!(acc, children, [
        %{header: "h4", id: id, title: title, children: []} | Pathex.get(acc, children)
      ])

    {"h5", id, title}, acc ->
      children = path(0 / :children / 0 / :children / 0 / :children)

      Pathex.set!(acc, children, [
        %{header: "h5", id: id, title: title, children: nil} | Pathex.get(acc, children)
      ])
  end)
  |> Enum.reverse()
end

defp reading_time(parsed_blog) do
  parsed_blog
  |> Floki.text()
  |> String.replace(~r/@|#|\$|%|&|\^|:|_|!|,/u, " ")
  |> String.split()
  |> Enum.count()
  |> div(@wpm)
  |> Timex.Duration.from_minutes()
  |> Timex.Format.Duration.Formatter.format(:humanized)
end

JS Hook

const TableOfContents = {
  mounted() {
    if (location.hash) {
      const hash = location.hash.replace("#", "");
      const header = document.getElementById(hash);

      header &&
        header.scrollIntoView({
          behavior: "instant",
          block: "start",
          inline: "end",
        });

      highlightNav(
        hash,
        [
          "hover:text-slate-900",
          "dark:text-slate-400",
          "dark:hover:text-slate-300",
        ],
        ["text-sky-500", "dark:text-sky-400"]
      );
    }
    window.addEventListener("hashchange", handleHashChange);
  },
  destroyed() {
    window.removeEventListener("hashchange", handleHashChange);
  },
};

function handleHashChange(event) {
  if (event.oldURL.includes("#")) {
    const hash = event.oldURL.split("#").pop();

    highlightNav(
      hash,
      ["text-sky-500", "dark:text-sky-400"],
      [
        "hover:text-slate-900",
        "dark:text-slate-400",
        "dark:hover:text-slate-300",
      ]
    );
  }

  if (location.hash) {
    const hash = location.hash.replace("#", "");

    highlightNav(
      hash,
      [
        "hover:text-slate-900",
        "dark:text-slate-400",
        "dark:hover:text-slate-300",
      ],
      ["text-sky-500", "dark:text-sky-400"]
    );
  }
}

function highlightNav(hash, remove, add) {
  const nav = document.getElementById(`${hash}-link`);

  if (nav) {
    nav.classList.remove(...remove);
    nav.classList.add(...add);

    const { parent } = nav.dataset;

    if (parent) {
      parent.split(">").forEach((parentId) => {
        const parent = document.getElementById(`${parentId}-link`);

        if (parent) {
          parent.classList.remove(...remove);
          parent.classList.add(...add);
        }

        const parentContainer = document.getElementById(
          `${parentId}-container`
        );

        if (parentContainer) {
          parentContainer.classList.remove("hidden");
          if (location.hash == `#${hash}`) parentContainer.style.display = null;
        }
      });
    }
  }
}

export default TableOfContents;

Example headers, with self linking.

<h2 id="installation" class="group flex whitespace-nowrap not-prose">
    <a href="#installation" class="relative flex items-center">
      Installation
      <span class="absolute -ml-8 opacity-0 group-hover:opacity-100 transition-opacity duration-500 group-focus:opacity-100 flex h-6 w-6 items-center justify-center rounded-md text-slate-400 shadow-sm ring-1 ring-slate-900/5 hover:text-slate-700 hover:shadow hover:ring-slate-900/10 dark:bg-slate-700 dark:text-slate-300 dark:shadow-none dark:ring-0">
        <svg width="12" height="12" fill="none" aria-hidden="true">
          <path
            d="M3.75 1v10M8.25 1v10M1 3.75h10M1 8.25h10"
            stroke="currentColor"
            stroke-width="1.5"
            stroke-linecap="round"
          >
          </path>
        </svg>
      </span>
    </a>
  </h2>
  <h3 id="linux">Linux</h3>
  <h4 id="ubuntu" class="group whitespace-nowrap not-prose">
    <a href="#ubuntu" class="relative flex items-center">
      Ubuntu
      <span class="absolute -ml-8 opacity-0 group-hover:opacity-100 transition-opacity duration-500 group-focus:opacity-100 flex h-6 w-6 items-center justify-center rounded-md text-slate-400 shadow-sm ring-1 ring-slate-900/5 hover:text-slate-700 hover:shadow hover:ring-slate-900/10 dark:bg-slate-700 dark:text-slate-300 dark:shadow-none dark:ring-0">
        <svg width="12" height="12" fill="none" aria-hidden="true">
          <path
            d="M3.75 1v10M8.25 1v10M1 3.75h10M1 8.25h10"
            stroke="currentColor"
            stroke-width="1.5"
            stroke-linecap="round"
          >
          </path>
        </svg>
      </span>
    </a>
  </h4>

Sorry about the wall of text, but I don’t know how to string some nice description for the code.
Perhaps I’ll write a blog post about it.

Features:

  1. The nav stays selected, even on page refresh.
  2. Only the highlight part is JS, rest is happening on the Elixir side.
  3. Even the parent nav-links, light/open up, when it’s child is selected.

Rest you can figure out the features and kinks.

Thanks everyone. :upside_down_face:


P.S. I left the reading time estimator in the code, in case someone finds better way to do that thing. :hatching_chick:

4 Likes