Union-Find (Disjoint-Set)

Union-Find (also called Disjoint-Set) is a data structure that keeps track of a set of elements partitioned into disjoint subsets. It supports two main operations:

  • Union: merges two subsets into a single one.
  • Find: determines which subset a particular element belongs to.

Implementation

The standard implementation of the Disjoint-Set data structure uses a tree-based approach where each node represents an element in a set. Each set is represented by a tree, where the root node is the representative element.

  1. Initially, each element is in its own set, represented by a singleton tree.
  2. During the Union operation, two trees are merged by making the root of one tree the child of the root of the other tree. The representative element of the resulting merged set is the root of the final tree.
  3. The Find operation is implemented by following the parent pointers from the element to the root node, which is the representative element of the set.

The path compression technique can be used to optimize the Find operation by making the nodes on the path from the element to the root point directly to the root node. This flattens the tree and makes future Find operations more efficient.

Example

import kotlin.math.absoluteValue

fun main() {
    val unionFind = UnionFind(noOfVertices = 4)
    listOf(
            Pair(0, 1),
            Pair(0, 3),
            Pair(1, 3),
            Pair(1, 2),
    ).forEach { edge ->
        if (unionFind.addEdge(from = edge.first, to = edge.second).not()) {
            println("edge $edge belongs to same set")
        }
    }
    println(unionFind)
}

//sampleStart
/**
 * Weighted union with path compression
 */
class UnionFind(noOfVertices: Int) {
    // click + to see full script
    /**
     * Set of all the vertices. Index represents the vertex
     * and value store the parent of vertex in the tree.
     *
     * This is weighted union, initially all vertices have parent
     * -1, which means that every vertex is in its own set. The rank
     * is stored as -ive value. This won't work if vertices have -ive
     * values. Other data structures can be used in that case.
     *
     * Later on when different vertices are combine to one set i.e.
     * union is performed, the vertex value will contain the no. of
     * child vertices to which this vertex is parent i.e. all vertices
     * all in same set.
     */
    private val vertexSets = Array(noOfVertices) { -1 }

    /**
     * Return false if both vertices belong to
     * same set, true otherwise
     */
    fun addEdge(from: Int, to: Int): Boolean {
        val parentFrom = find(from)
        val parentTo = find(to)
        if (parentFrom == parentTo) {
            return false
        }
        union(parentFrom, parentTo)
        return true
    }

    private fun find(vertex: Int): Int {
        if (vertexSets[vertex] <= -1) {
            return vertex
        }
        return find(vertexSets[vertex])
    }

    /**
     * Based on ranking, merge parents. This also assigns
     * root parent to other vertices instead of direct parent which
     * is also called path compression.
     */
    private fun union(parentA: Int, parentB: Int) {
        val rankA = vertexSets[parentA].absoluteValue
        val rankB = vertexSets[parentB].absoluteValue
        /**
         * Compare absolute value to pick greater rank parent
         * as update the rank again as -ive value
         */
        if (rankA >= rankB) {
            vertexSets[parentB] = parentA
            vertexSets[parentA] = rankA.plus(rankB).times(-1)
        } else {
            vertexSets[parentA] = parentB
            vertexSets[parentB] = rankA.plus(rankB).times(-1)
        }
    }
   
    override fun toString(): String {
       return "Parents: ${vertexSets.contentToString()}\nVertices: ${vertexSets.indices.toList()}"
    }
}

//sampleEnd

Usage

Union-Find is often used for solving problems that involve partitioning a set of elements into different groups or clusters, and for efficiently testing whether two elements belong to the same group. It can be used in various algorithms and problems, such as Kruskal’s algorithm for finding the minimum spanning tree of a graph, finding connected components of an undirected graph and more.

top