How to check if a list is a sublist(out of order list) of another list

Ajaxdone · July 7, 2020, 8:22pm

l1 = ["1","2","5","9"]
l2 = ["1","9"]
List.contains?(l1,l2)

Anything like this? to check if l2 is a sublist of l1.

I know can use Enum.each and check one by one, just want to know if anything like that. thanks

emoragaf · July 7, 2020, 8:32pm

maybe using MapSet.subset?/2

NobbZ · July 7, 2020, 8:35pm

l2 is not a sublist of l1.

But in general, sublist checking is quite easy to do.

def sublist?([], _), do: false
def sublist?(l1 = [_|t], l2) do
  List.starts_with?(l1, l2) or sublist?(t, l2)
end

hauleth · July 7, 2020, 8:36pm

def subset(_, []), do: true
def subset([], _), do: false
def subset([h | t1], [h | t2]), do: subset(t1, t2)
def subset([_ | t], list), do: subset(t, list)

Ajaxdone · July 7, 2020, 8:44pm

Maybe I need to clarify a bit, actually I don’t mean sublist, I mean a list’s all items in another list, don’t care order.

NobbZ · July 7, 2020, 8:45pm

Then mapset might indeed be the most efficient way to do so.

Ajaxdone · July 7, 2020, 8:48pm

I tested your solutions, in my previous dataset is works, but if change to this, not working:

l1 = ["9","2","5","1"]
l2 = ["1","9"]

NobbZ · July 7, 2020, 8:49pm

Yes, @hauleths approach is for pre-ordered input only.

Ajaxdone · July 7, 2020, 8:55pm

According to @NobbZ 's suggestion as well, I tried this one works:

big_list = ["9", "2", "5", "1"]
sub_out_of_order_list = ["1", "9"]
MapSet.subset?(MapSet.new(sub_out_of_order_list), MapSet.new(big_list))

Thank you everyone.

gregvaughn · July 7, 2020, 9:18pm

Depending on the relative sizes of your lists, this could also be a viable option:

Enum.all?(sub_out_of_order_list, &Enum.member?(big_list, &1))

Ajaxdone · July 8, 2020, 1:00am

I am only using to check roles, so relative small lists for both.

If really want to check in very large list, then better sort the sub one first, and then search in :orddict or Map.

But no matter what I think MapSet still very fast on large list, as when convert to Set, using hashcode to search should be O(1).

julismz · July 14, 2022, 2:54pm

Giving this 2 lists:

[1, 3], [1, 2, 3]

In this case, MapSet.subset? returns true, but [1,3] is not a subset of [1,2,3]…

iex(1)> MapSet.subset?(MapSet.new([1, 3]), MapSet.new([1, 2, 3]))                          
true

benwilson512 · July 14, 2022, 3:07pm

It is. If you say “A is a subset of B” it means " every item in A is an item in B". 1 is in [1,2,3] and 3 is in [1,2,3].

julismz · July 14, 2022, 3:14pm

So I might be confused with sublists… Chatting on Discord I figured out that what I’m looking for is subsequence not subset

julismz · July 15, 2022, 11:53am

In case anyone is looking for sublists (ordered lists and not subsets) here is a function to do that:

@spec includes_sublist?(list, nonempty_list) :: boolean
def includes_sublist?(list, sublist) when is_list(list) and is_list(sublist) and length(sublist) > 0  do
    Enum.chunk_every(list, length(sublist), 1, :discard)
    |> Enum.member?(sublist)
  end

hauleth · July 15, 2022, 4:18pm

I would say that this one can be painfully slow for larger lists. Additionally I do not know if Erlang will be able to optimise that duplicated length/1 calls, which mean that it can be even slower for larger lists. A slightly better solution could be:

def includes_sublist?(list, sublist) when length(list) > length(sublist) do
  do_includes?(list, sublist, {list, sublist})
end

defp do_includes?(_, [], _), do: true
defp do_includes?([], _, _), do: false
defp do_includes?([a | xs], [a | ys], ret), do: do_includes?(xs, ys, ret)
defp do_includes?(_, _, {[_ | list], sublist}), do: do_includes?(list, sublist, {list, sublist})

This should work in the case when we match empty sublist (each list contains empty sublist, we assume that we always get the proper lists). It still has suboptimal performance (can be quadratic) but it is much more optimal for the memory (as it will not create duplicated lists in memory for includes_sublist?(List.duplicate(1, 1000), [2]). It could probably be improved by checking for least popular segment first, and then doing check, or checking for multiple elements at once, or other classical “string searching” magic, but I think that as a baseline it would perform a little bit better, especially when sublist is much shorter than list.

julismz · July 15, 2022, 4:34pm

Very nice improvement.

By Jose’s suggestion, I’m trying to apply the bad character rule for the Boyer–Moore algorithm to these recursive options and then do some benchmarks to see the magic happens.

julismz · July 22, 2022, 10:45pm

After a lot of tests and research, the naive approach is infinite better than Boyer-Pattern to work within lists. In case of pure strings, naive can’t nothing to do against Boyer-Moore.

@hauleth solution still being the best for this case.