A nice way to combine captured functions? (Credo.Check.Refactor.RejectReject)

This code generates a couple of Credo warnings:

    filtered_paragraphs =
      paragraphs
      |> Floki.filter_out("[align=center]")
      |> Enum.reject(&repealed?/1)
      |> Enum.reject(&subchapter_heading?/1)
      |> Enum.reject(&subsubchapter_heading?/1)

One Enum.reject/2 is more efficient than Enum.reject/2 |> Enum.reject/2 (refactor:Credo.Check.Refactor.RejectReject)

Is there some expressive way to combine these three rejects with a logical or?

Is refactoring the best answer, extracting them into a single new named function that I use with reject? Are there any function-level tools for working with captured functions?

Maybe a reject_any? function would be good here.

Is this code on a performance critical path? If not, I would keep the code like it is now. It’s very readable.

1 Like

If you change it to Stream.reject does it complain? If you know it’s going to be a small input then what you have is clearly no big deal. Personally I would go either the named function route or a string of ors in the fn body or I would just disable linting for that line if I know the input is always going to be small (but guh, that does suck!)

1 Like

Sort-of on a critical path. It’s a batch process that runs over e.g. 50,000 HTML files (all the laws in Oregon). Each HTML file can have a list of 10’s or 100’s of HTML paragraphs. So, I guess that this naive implementation will definitely create a lot of garbage collection trash.

I’ve thought that’d be interesting — chnaging to the streaming API.

Floki keeps things in memory as a list - when dealing with stuff that is already stored in memory, the Stream API only brings a performance penalty (it’s slower) with no additional memory savings (stuff is already in memory, so there’s no advantage in consuming it lazily).

If you want to use Stream effectively in this use case you will need to generate your paragraphs lazily.

2 Likes

You could do something like this:

    filtered_paragraphs =
      paragraphs
      |> Floki.filter_out("[align=center]")
      |> Enum.reject(&(repealed?(&1) or subchapter_heading?(&1) or subsubchapter_heading(&1)))
3 Likes

Normally I’d tell you to ignore this – including instruct credo to ignore it – but since it seems to be on a performance-critical path then @Marcus’s suggestion is the best one IMO.

1 Like