Something better than Enum.any?

I have a list of 20k string elements each being of around 20 characters long.

I’ll need to check whether a given string S contains in itself any of the 20k elements. As a substring.
This will happen regularly.

Is there a better solution than Enum.any?

Maybe a Stream.filter?

stream = Stream.filter([1, 2, 3], fn x ->rem(x, 2) == 0 end) Enum.to_list(stream) [2]

Have a look at :binary.match/2 and :binary.compile_pattern/1. If the list changes infrequently (relative to the frequency of matches) you’ll benefit greatly from a compiled pattern.

There was some discussion in various places about VM optimizations recently, e.g.:

1 Like

For such huge list I would even suggest using flow library with really similar API.

Instead of storing that 20k strings in list create trie. It will give You much better complexity. Or you can build just big NFA like BurntSushi’s fst Rust library, there you have detailed blog post about it. Using Flow or Stream will only hide the complexity instead of solving it.

3 Likes

Depending on your exact use case a Bloom Filter could be worth considering too.

3 Likes