How to check if a list is a sublist(out of order list) of another list

l1 = ["1","2","5","9"]
l2 = ["1","9"]
List.contains?(l1,l2)

Anything like this? to check if l2 is a sublist of l1.

I know can use Enum.each and check one by one, just want to know if anything like that. thanks

3 Likes

maybe using MapSet.subset?/2

6 Likes

l2 is not a sublist of l1.

But in general, sublist checking is quite easy to do.

def sublist?([], _), do: false
def sublist?(l1 = [_|t], l2) do
  List.starts_with?(l1, l2) or sublist?(t, l2)
end
5 Likes
def subset(_, []), do: true
def subset([], _), do: false
def subset([h | t1], [h | t2]), do: subset(t1, t2)
def subset([_ | t], list), do: subset(t, list)
3 Likes

Maybe I need to clarify a bit, actually I don’t mean sublist, I mean a list’s all items in another list, don’t care order.

2 Likes

Then mapset might indeed be the most efficient way to do so.

2 Likes

I tested your solutions, in my previous dataset is works, but if change to this, not working:

l1 = ["9","2","5","1"]
l2 = ["1","9"]

Yes, @hauleths approach is for pre-ordered input only.

2 Likes

According to @NobbZ 's suggestion as well, I tried this one works:

big_list = ["9", "2", "5", "1"]
sub_out_of_order_list = ["1", "9"]
MapSet.subset?(MapSet.new(sub_out_of_order_list), MapSet.new(big_list))

Thank you everyone.

3 Likes

Depending on the relative sizes of your lists, this could also be a viable option:

Enum.all?(sub_out_of_order_list, &Enum.member?(big_list, &1))

3 Likes

I am only using to check roles, so relative small lists for both.

If really want to check in very large list, then better sort the sub one first, and then search in :orddict or Map.

But no matter what I think MapSet still very fast on large list, as when convert to Set, using hashcode to search should be O(1).

Giving this 2 lists:

[1, 3], [1, 2, 3]

In this case, MapSet.subset? returns true, but [1,3] is not a subset of [1,2,3]…

iex(1)> MapSet.subset?(MapSet.new([1, 3]), MapSet.new([1, 2, 3]))                          
true

It is. If you say “A is a subset of B” it means " every item in A is an item in B". 1 is in [1,2,3] and 3 is in [1,2,3].

2 Likes

So I might be confused with sublists… Chatting on Discord I figured out that what I’m looking for is subsequence not subset :slight_smile:

2 Likes

In case anyone is looking for sublists (ordered lists and not subsets) here is a function to do that:

@spec includes_sublist?(list, nonempty_list) :: boolean
def includes_sublist?(list, sublist) when is_list(list) and is_list(sublist) and length(sublist) > 0  do
    Enum.chunk_every(list, length(sublist), 1, :discard)
    |> Enum.member?(sublist)
  end

I would say that this one can be painfully slow for larger lists. Additionally I do not know if Erlang will be able to optimise that duplicated length/1 calls, which mean that it can be even slower for larger lists. A slightly better solution could be:

def includes_sublist?(list, sublist) when length(list) > length(sublist) do
  do_includes?(list, sublist, {list, sublist})
end

defp do_includes?(_, [], _), do: true
defp do_includes?([], _, _), do: false
defp do_includes?([a | xs], [a | ys], ret), do: do_includes?(xs, ys, ret)
defp do_includes?(_, _, {[_ | list], sublist}), do: do_includes?(list, sublist, {list, sublist})

This should work in the case when we match empty sublist (each list contains empty sublist, we assume that we always get the proper lists). It still has suboptimal performance (can be quadratic) but it is much more optimal for the memory (as it will not create duplicated lists in memory for includes_sublist?(List.duplicate(1, 1000), [2]). It could probably be improved by checking for least popular segment first, and then doing check, or checking for multiple elements at once, or other classical “string searching” magic, but I think that as a baseline it would perform a little bit better, especially when sublist is much shorter than list.

2 Likes

Very nice improvement.

By Jose’s suggestion, I’m trying to apply the bad character rule for the Boyer–Moore algorithm to these recursive options and then do some benchmarks to see the magic happens.

After a lot of tests and research, the naive approach is infinite better than Boyer-Pattern to work within lists. In case of pure strings, naive can’t nothing to do against Boyer-Moore.

@hauleth solution still being the best for this case.

1 Like