List of Sets: Find Intersecting Sets

This took me way too long and I think its right/working, but … am I missing a better solution?

defmodule IntersectingSets do
  def meld([], [], [], result) do
    result
  end

  def meld([item|rest], [], [], result) do
    meld(rest, [item], result, [])
  end

  def meld(rest, [item], [], result) do
    meld(rest, [], [], [item|result])
  end

  def meld(rest, [item], [head|tail], result) do
    cond do
      MapSet.intersection(item, head) |> MapSet.size == 0 ->
        meld(rest, [item], tail, [head|result])
      true ->
        meld(rest, [MapSet.union(item, head)], tail, result)
    end
  end

  def meld([item|rest]) do
    meld(rest, [], [], [item])
  end
end

Why so complicate?

iex> m1 = MapSet.new([1, 2, 3])
#MapSet<[1, 2, 3]>
iex> m2 = MapSet.new([2, 3, 4])
#MapSet<[2, 3, 4]>
iex> MapSet.intersection m1, m2
#MapSet<[2, 3]>

… or maybe I didn’t get what You want

UPDATE: Well, I didn’t get what You want to do… better use data and expected result in that case.
Also, I would not write it like this… first the main entry function, then the rest private…

defmodule Koko do
  def meld(list) do
    ....
  end

  defp do_meld(a, b, c, d) do
    ....
  end
end

With your code, You need to start to read from the bottom.

1 Like

I’m also unclear what the end result is at first glance. Additionally of the actual code, if/else is generally preferred over cond and you can use Enum.empty?/1 instead of the MapSet.size == 0

FWIW, you’ll get much better responses if you describe what this code is supposed to do.

Tracing through an example by hand, it will return the whole input list if every given MapSet intersects with every other one. Non-intersecting sets will be combined.

Some general notes on structure:

  • the second argument is always patterned-matched against either [] or [item]. Consider removing the useless list wrapping and use nil vs item instead

  • +1 to using if over a cond with one non-default branch

  • consider using MapSet.disjoint?/2 - there’s probably a tiny performance benefit to not constructing the whole result of MapSet.intersection, but more importantly it’s clearer what the intent of the check is

2 Likes

This is much more idiomatic, agreed. If it’s part of a very hot loop in the code with tight performance tolerances, the specialized/simple version may still perform better, so it would be a good idea to measure with Benchee or similar.

You might use Enum.reduce/3 in the following way (untested, but it should work):

Enum.reduce(sets, [], fn set, sets ->
  case Enum.split_with(sets, &MapSet.disjoint?(&1, set)) do
    {_, []} ->  # not a single join, totally alien, appending
      [set | sets]
    {disjoined, joined} ->  # ⇓⇓⇓ here is a trick ⇓⇓⇓
      [Enum.reduce(joined, set, &MapSet.union/2) | disjoined]
  end
end)

The only interesting thing here is that once we discovered all the sets joined the tested one, all of them are to be joined.

1 Like

thx everyone! working a bit too much but once i do some rewrite based on comments here ill post again!

(cant figure out how to add some description to the original post?!! i must be a dummy! :slight_smile:)

addressing replies:

Goal: Given a list of sets, return a list of the union of intersecting sets. Can think of this as a finding all the components in a network graph. https://en.wikipedia.org/wiki/Component_(graph_theory)

Algo: Reduce the set, and on each reduction, reduce the found intersecting sets and merge (union) the current set (from the 1st/outside reduce) into the appropriate intersecting set.

  def meld2(list_of_sets) do
    list_of_sets
    |> Enum.reduce([],
      fn
        next_set, [] ->     # seems necessary for a list of size 1?
          [next_set]
        next_set, disjointed_sets ->
          disjointed_sets
          |> Enum.reduce({next_set, []},
             fn disjointed_set, {melded_set, new_disjointed_set} ->
               if MapSet.disjoint?(melded_set, disjointed_set) do
                 {melded_set, [disjointed_set|new_disjointed_set]}
               else
                 {MapSet.union(melded_set, disjointed_set), new_disjointed_set}
               end
            end
          )
          |> (fn {melded, others} -> [melded|others] end).()    # people dont like anon funcs, something better?
      end
    )
  end
1 Like

This is AWESOME!

Testing:

$ elixir /c/Temp/group_tests.exs
list of sets: [#MapSet<[:a]>, #MapSet<[:a, :b]>, #MapSet<[:c]>, #MapSet<[:b, :d]>]
meld: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]
meld2: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]
mudasobwa's meld: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]


list of sets: [#MapSet<[15, 20]>, #MapSet<[13, 24]>, #MapSet<[2, 7]>, #MapSet<[14, 24]>, #MapSet<[8, 18]>, #MapSet<[10, 21]>, #MapSet<[3, 9]>, #MapSet<[3, 25]>, #MapSet<[9, 15]>, #MapSet<[2, 19]>]
meld: [#MapSet<[2, 7, 19]>, #MapSet<[10, 21]>, #MapSet<[13, 14, 24]>, #MapSet<[8, 18]>, #MapSet<[3, 9, 15, 20, 25]>]
meld2: [#MapSet<[3, 9, 15, 20, 25]>, #MapSet<[2, 7, 19]>, #MapSet<[8, 18]>, #MapSet<[10, 21]>, #MapSet<[13, 14, 24]>]
mudasobwa's meld: [#MapSet<[3, 9, 15, 20, 25]>, #MapSet<[13, 14, 24]>, #MapSet<[10, 21]>, #MapSet<[8, 18]>, #MapSet<[2, 7, 19]>]

Actually, it might be even simplified. As @saverio-kantox pointed out to me, case is not needed at all. The second clause perfectly does in both cases:

Enum.reduce(sets, [], fn set, sets ->
  {disjoined, joined} =
    Enum.split_with(sets, &MapSet.disjoint?(&1, set))
  [Enum.reduce(joined, set, &MapSet.union/2) | disjoined]
end)

For empty joined, reduce/3 would immediately return the initial accumulator.

2 Likes

tested:

$ elixir find_component.exs
list of sets: [#MapSet<[:a]>, #MapSet<[:a, :b]>, #MapSet<[:c]>, #MapSet<[:b, :d]>]
meld: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]
meld2: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]
mudasobwa's meld: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]
@saverio_kantox optimization: [#MapSet<[:a, :b, :d]>, #MapSet<[:c]>]


list of sets: [#MapSet<[8, 25]>, #MapSet<[5, 7]>, #MapSet<[8, 11]>, #MapSet<[7, 24]>, #MapSet<[9, 10]>, #MapSet<[18, 24]>, #MapSet<[2, 11]>, #MapSet<[8, 20]>, #MapSet<[11, 18]>, #MapSet<[4, 23]>]
meld: [#MapSet<[4, 23]>, #MapSet<[9, 10]>, #MapSet<[2, 5, 7, 8, 11, 18, 20, 24, 25]>]
meld2: [#MapSet<[2, 5, 7, 8, 11, 18, 20, 24, 25]>, #MapSet<[9, 10]>, #MapSet<[4, 23]>]
mudasobwa's meld: [#MapSet<[2, 5, 7, 8, 11, 18, 20, 24, 25]>, #MapSet<[9, 10]>, #MapSet<[4, 23]>]
@saverio_kantox optimization: [#MapSet<[4, 23]>, #MapSet<[2, 5, 7, 8, 11, 18, 20, 24, 25]>, #MapSet<[9, 10]>]