My problem
When parsing old government webpages, my input is often just like that one:
<p><b>Section 1, name, and text</b></p>
<p>Section 1 more text</p>
<p>Section 1 more text</p>
<p><b>Section 2, name, and text</b></p>
<p>Section 2 more text</p>
<p><b>Section 3, name, and text</b></p>
// etc.
I’d really like feedback about the approach I came up with last night:
This looks like a take/while/scan kind of problem. But I couldn’t find an Enum
or Floki
function that seemed to handle this kind of repeated pattern.
I decided to write a function that would generically group items and following items using a predicate. In the case of the HTML above, the predicate would be “Does the element contain a <b>
?” So, abstractly:
input = [1,2,2,2,2,1,1,1,2]
output = [[1,2,2,2,2], [1], [1], [1, 2]]
I realized that’s not too hard with Enum.reduce
:
def group_after(list, predicate) do
reduce(list, [], fn e, acc ->
case predicate.(e) do
true ->
[[e]] ++ acc
false ->
{curr, rest} = List.pop_at(acc, 0)
[curr ++ [e] | rest]
end
end)
end
It works fine. Although, the reduce function’s code is very procedural and not expressive. What do you all think? Is there another approach I’m not considering?
An alternate idea: Consider a string
"tfffftttf"
as an isomorph ofmap(list, predicate)
. Then use an expressive regex like~r/tf*/
to group the true & false — instead of the proceduralreduce
. Finally, undo the mapping back into the original list elements.