Union Find algorithm help - am having to iterate over all the “parents” a second time

I’m trying to do a very basic union-find (not doing any path compression by rank or size).

    def find(dsu, x) do
      case Map.get(dsu, x) do
        nil ->
            {Map.put(dsu, x, x), x}
        ^x -> {dsu, x}
        y -> 
            {dsu2, p} = find(dsu, y)
            {Map.put(dsu2, x, p), p}
      end
    end

  def union(%{} = dsu, x, y) do 
    {dsu, a} = find(dsu, x)
    {dsu, b} = find(dsu, y)
    if a == b do 
      dsu
    else
      [a,b] = Enum.sort([a,b])
      {dsu, c} = find(dsu, a)
      Map.put(dsu, b, c)
    end    
  end

The issue I’m having is that after iterating over a set of nodes I have to iterate over all the “parents” a second time because the find function isn’t recursing on those.

@spec num_islands(grid :: [[char]]) :: integer
  def num_islands(grid) do
    map = LandMap.new(grid) 
    m = grid |> length() |> Kernel.-(1)
    n = hd(grid) |> length() |> Kernel.-(1)
    
    
    intermediate = 
    0..m
    |> Enum.reduce(%{}, fn x, acc -> 
      0..n
      |> Enum.filter(fn y -> MapSet.member?(map, {x,y}) end)
      |> Enum.reduce(acc, fn y, acc2 ->
        acc3 = DSU.union(acc2, {x,y}, {x,y})
        LandMap.neighbors({x,y}, m, n)
        |> Enum.filter(fn coord -> MapSet.member?(map, coord) end)
        |> Enum.reduce(acc3, fn neighbor, acc4 -> DSU.union(acc4, neighbor, {x,y}) end)
      end)
    end)
    
    intermediate
    |> Map.values()
    |> Enum.reduce(intermediate, fn x, islands -> 
      case DSU.find(islands, x) do 
        {^islands, ^x} -> islands
        {islands, y} -> 
          islands
          |> Enum.map(fn {k, v} = curr -> 
            if v == x do 
              {k, y}
            else
              curr
            end
          end)
          |> Map.new()
        true -> islands
      end
    end)
    |> Map.values()
    |> Enum.uniq()
    |> Enum.count()
  end

If I don’t do the second pass a node that was initially it’s own parent but later linked to others will get updated, but any nodes pointing to it as parent would not be updated. Do I need to track a “parents” map and a “children” map so that I can update any children when a “parent” is updated?

3 Likes

Could you please describe original task and what “union find algorithm” is for?

1 Like

Leetcode Number of Islands problem: LeetCode - The World's Leading Online Programming Learning Platform

Also in case you meant what “union find” means, a lot of places refer to it as “disjoint set” I think.

1 Like

Generally speaking, you need traverse all the map point by point and do BFS for every new piece of land you encounter. This will be O(N) in time and memory solution where N is the size of the map. I don’t know anything about Union Find algorithm you’re referring to

Thanks. I’m not really trying to solve the problem so much as use this problem to learn about implementing disjoint set and union find concepts. Disjoint-set data structure - Wikipedia

I will look into it after work. It’s surprising how hard on memory disjoint set is despite being a simple algorithm.

I looked into number of island and I think using :digraph and connected component could help?

Back to your DS algorithm I’m interested to trace it but if it helps let me send you an implementation I did: https://github.com/code-shoily/ex_algo/blob/main/lib/ex_algo/set/disjoint_set.ex

Thank you for motivating me to look into that again, I’ll be back soon

1 Like

haha, I’ve had your code in an open tab the whole time I’ve been working on this and the Most Stones Removed problem. I had no chance getting it right until I read your code.

1 Like

Do you have any “intuitive” feel towards disjoint set? I seem to lose my understanding of it after some time of not using it. I looked at my own code and was like, wtf was I thinking?

1 Like

So I spun up a livebook and tried your code, see if I do these:

{ds, x} = UnionFind.find(%{}, 1)
{ds, y} = UnionFind.find(ds, 2) 
{ds, z} = UnionFind.find(ds, 4)

ds
|> UnionFind.union(2, 4)
|> UnionFind.union(2, 1)

Then I end up with %{1 => 1, 2 => 1, 4 => 2} - which is not wrong, 4 has the parent 2 which has the parent 1. So 1, 2 and 4 belong to the same club. However, you’d need to flatten it up with an extra run which would collapse the 4 -> 2 -> 1 path. I guess that’s why compressions optimize, since when you have a clash and you use the size/rank etc to be the determinant of who becomes the parent, so in that case, when 1 met 2, 2 already had a follower (that being 4), so it deserved to be 1-s parent, so there is no ambiguity.

Now, with union by rank compression enabled, look into the sequence below:

set = DisjointSet.new(10)

set = 
  set
  |> DisjointSet.union(2, 3)
  |> DisjointSet.union(1, 2)
  |> DisjointSet.union(8, 9)
  |> DisjointSet.union(7, 8)
  |> DisjointSet.union(6, 9)
  |> DisjointSet.union(5, 6)
  |> DisjointSet.union(4, 5)
  |> DisjointSet.union(1, 9)

If you now look into the parent/rank:

%DisjointSet{
  parents: %{0 => 0, 1 => 2, 2 => 2, 3 => 2, 4 => 8, 5 => 8, 6 => 8, 7 => 8, 8 => 2, 9 => 8},
  ranks: %{0 => 1, 1 => 1, 2 => 3, 3 => 1, 4 => 1, 5 => 1, 6 => 1, 7 => 1, 8 => 2, 9 => 1}
}

I tried creating two heavily connected group and then join them, so that there are two sets of parents.

See find path is shorter, it will be ? -> 8 -> 2 in the worst case, however, we could introduce more techniques to make it even better. So ideally, the island counting algorithm should really be running find on each points, making it a (O(LandVertex * PathCompressionDS)). Also this assumes some kungfu to optimize parent length and updating of data structure per find run.

So, to find the number of islands, I think a way would be to find each point in the processed disjoint set and adding the parent to a set. So the more optimized find is, the more optimized the collection would be. (But I’d still prefer connected components approach to solving the island problem, and I get nervous around 2D grid type problems with Elixir :frowning: )

1 Like

I should revisit my implementation, it might improve with some optimization and might help me learn a few things, especially around some intuitive understanding of how the compressions, splitting, halving etc work on the parent data and friends.

1 Like
defmodule DisjoinSets do
  def add(disjoint_sets, entry) do
    Map.put(disjoint_sets, entry, {:root, 0})
  end

  def find(disjoint_sets, entry) do
    with {:ok, root, _rank} <- do_find(disjoint_sets, entry) do
      {:ok, root}
    end
  end

  defp do_find(disjoint_sets, entry) do
    case disjoint_sets do
      %{^entry => {:root, rank}} ->
        {:ok, entry, rank}

      %{^entry => {:parent, parent}} ->
        do_find(disjoint_sets, parent)

      %{} ->
        {:error, :not_present}
    end
  end

  def union(disjoint_sets, left, right) do
    with(
      {:ok, left_parent, left_rank} <- do_find(disjoint_sets, left),
      {:ok, right_parent, right_rank} <- do_find(disjoint_sets, right)
    ) do
      cond do
        left_rank < right_rank ->
          {:ok, Map.put(disjoint_sets, left_parent, {:parent, right_parent})}

        left_rank > right_rank ->
          {:ok, Map.put(disjoint_sets, right_parent, {:parent, left_parent})}

        left_rank == right_rank ->
          disjoint_sets =
            disjoint_sets
            |> Map.put(right_parent, {:parent, left_parent})
            |> Map.put(left_parent, {:root, left_rank + 1})

          {:ok, disjoint_sets}
      end
    end
  end
end

I wrote this simple rank-based disjoint set implementation. I chose verbosity and ease of reading over performance

2 Likes

Very readable and clean. Thank you for sharing.

My basic understanding is the disjoint sets are useful when you need to know how many unique groups you have in a data set as defined by some connection criteria. So for this problem data points are connected if they are “land” (represented by digit 1) and they share a border. In the “Most Stones Removed” problem the connection is any shared column OR row.
My sense of why it can outperform DFS or BFS in certain situations has to do with not needing to track complete paths and the fact that paths containing cycles are no problem in disjoint sets.

I think I understand this, thanks. (And I also feel a bit frustrated with 2D grids in Elixir b/c so many of the default solutions rely on mutable data stores and access by index. Comes up a lot in Advent of Code.)

@hst337 – Thanks for the great example implementation. Really nice.

1 Like

2D grids can be represented as :array or just a tuple

For this exact task both DisjointSets and BFS solutions will have the same exact complexity in space and time. DFS solution is not suitable here, since island is a graph, which contains a lot of different trees, thus it can have cycles in it.

1 Like

Yes that is often what I do, but it’s not really ergonomic.

In advent of code most of my grids are a map where the key is {x, y} or {x, y, z} and it is very simple to work with.