Code critique: group_after/2 for parsing flat HTML

Slow load (possibly timeout). I know what you feel. Hope those are not ASP.net pages with invalid HTML code. :smiling_imp:

Anyway, here is my solution:

Mix.install([:floki])

defmodule Example do
  def sample(list, acc \\ [])

  # for empty input after parsing
  def sample([], []), do: []

  # when all p elements are passed
  # reverse last section texts and wrap them into list
  # as otherwise a resulting list would be added
  # to a main list where each element contains a list of sections texts
  def sample([], acc), do: [:lists.reverse(acc)]

  # in case of first bold text simply add text to
  # as the only element in new acc
  # and call function recursively
  def sample([{"p", _, [{"b", [], [text]}]} | tail], []) when is_binary(text) do
    sample(tail, [text])
  end

  # however if there is some data in acc
  # reverse its contents and return it as a list of
  # last section texts and recursive call
  def sample([{"p", _, [{"b", [], [text]}]} | tail], acc) when is_binary(text) do
    [:lists.reverse(acc) | sample(tail, [text])]
  end

  # when we got a normal text simply add it to acc
  # and call function recursively
  def sample([{"p", _, [text]} | tail], acc) when is_binary(text) do
    sample(tail, [text | acc])
  end
end

"""
<p><b>Section 1, name, and text</b></p>
<p>Section 1 more text</p>
<p>Section 1 more text</p>
<p><b>Section 2, name, and text</b></p>
<p>Section 2 more text</p>
<p><b>Section 3,  name, and text</b></p>
"""
|> Floki.parse_fragment!()
|> Example.sample()
|> dbg()

Pattern matching is a fastest solution. You can take a look at this post to see possible alternative solutions.

1 Like