This code generates a couple of Credo warnings:
Enum.reject/2 is more efficient than
Enum.reject/2 |> Enum.reject/2 (refactor:Credo.Check.Refactor.RejectReject)
Is there some expressive way to combine these three
rejects with a logical
Is refactoring the best answer, extracting them into a single new named function that I use with
reject? Are there any function-level tools for working with captured functions?
reject_any? function would be good here.
Is this code on a performance critical path? If not, I would keep the code like it is now. It’s very readable.
If you change it to
Stream.reject does it complain? If you know it’s going to be a small input then what you have is clearly no big deal. Personally I would go either the named function route or a string of
ors in the
fn body or I would just disable linting for that line if I know the input is always going to be small (but guh, that does suck!)
Sort-of on a critical path. It’s a batch process that runs over e.g. 50,000 HTML files (all the laws in Oregon). Each HTML file can have a list of 10’s or 100’s of HTML paragraphs. So, I guess that this naive implementation will definitely create a lot of garbage collection trash.
I’ve thought that’d be interesting — chnaging to the streaming API.
Floki keeps things in memory as a list - when dealing with stuff that is already stored in memory, the
Stream API only brings a performance penalty (it’s slower) with no additional memory savings (stuff is already in memory, so there’s no advantage in consuming it lazily).
If you want to use
Stream effectively in this use case you will need to generate your paragraphs lazily.
You could do something like this:
|> Enum.reject(&(repealed?(&1) or subchapter_heading?(&1) or subsubchapter_heading(&1)))
Normally I’d tell you to ignore this – including instruct
credo to ignore it – but since it seems to be on a performance-critical path then @Marcus’s suggestion is the best one IMO.