Finding the lowest common ancestor (LCA) using Euler tour and sparse table

Algorithms #algorithms #trees #lca

Euler tour

During an Euler tour, each node is added to the tour as it is visited (either moving from parent to child or returning from child to parent). The tour starts at the root node and reaches back to it after visiting all the vertices.

Euler tour requires 2*N-1 vertices to store the tour.

Lowest common ancestor (LCA)

The lowest common ancestor (LCA) of two nodes in a tree is the lowest (i.e., deepest) node that is a common ancestor of both nodes.

In other words, it is the node that is furthest from the root of the tree and still has both nodes as descendants.

One common approach to solving this problem is to use the Euler tour of the tree. By storing the indices of the first and last occurrence of each node in the Euler tour, it is possible to determine whether a given node is an ancestor of another node and to find the LCA of two nodes in O(1) time.

Algorithm

The LCA problem can be reduced to Range Minimum Query (RMQ) problem. With RMQ, LCA can be found in constant time O(1) after doing some preprocessing (creating sparse table) which takes O(n lon n) time.

Preprocessing time can be improved to O(n) with Farach-Colton and Bender optimization.

Steps

The algorithm can be split into two steps

Step 1

Ensure all nodes are uniquely indexed e.g. nodes can be assigned id from 0 to N-1
Start by creating euler tour of the tree
While creating euler tour, maintain a separate array/list to keep track of the depth of each node (on every visit) in the tour

Step 2

Create sparse table from depth array for RMQ. It should return the index of minimum value
Pick indices of the nodes from euler tour whose LCA needs to be found
To get the LCA of the nodes, find the minimum value in the depth array between these indices. The returned value is the index of the LCA in the Euler tour

The steps are illustrated below

Illustration Step 1. Find euler tour and track depth for each node in the tour

Illustration Step 2. Find LCA of two nodes

We want to find LCA of (3, 2) in the tree give above. The euler tour and depth is already tracked in the table above.

Example

Based on the tree given above, this script attempts to find the LCA of (12, 14), (6, 8) & (12, 2)

Check out the following articles for details on RMQ using sparse tables and array representation of a binary tree.

import kotlin.math.floor
import kotlin.math.log
import kotlin.math.max
import kotlin.math.min

//sampleStart
fun main() {
    // click + to see full script

    // a binary tree represented as an array
    val tree = intArrayOf(4, 9, 2, 10, 0, 6, 3, 13, 12, 5, 14, 7, 1, 8, 11).toTypedArray()

    val (tourNodesDepth, eulerTour, lastIndexOfEulerNode) = createEulerTourAndDepthArray(tree)
    val spareTable = SpareTable(tourNodesDepth)

    // get lowest common ancestor
    fun getLca(node1: Int, node2: Int): Int {
        val index = spareTable.queryMinIndex(
            lastIndexOfEulerNode[node1]!!,
            lastIndexOfEulerNode[node2]!!
        )
        return eulerTour[index]
    }

    println("lca 12,14 = ${getLca(12, 14)}")
    println("lca 6,8 = ${getLca(6, 8)}")
    println("lca 12,2 = ${getLca(0, 6)}")
}
//sampleEnd

data class Result(
    val tourNodesDepth: Array<Int>,
    val eulerTour: ArrayList<Int>,
    val lastIndexOfEulerNode: Map<Int, Int>
) {
    // todo: override equals and hashcode
}

fun createEulerTourAndDepthArray(tree: Array<Int>): Result {

    var depthIndex = 0
    var eulerIndex = 0

    // no. of nodes in euler tour = (2 * no. of nodes) - 1
    val noOfEulerNodes = tree.size.shl(1).minus(1)
    val tourNodesDepth = Array(noOfEulerNodes) { 0 }
    val eulerTour = ArrayList<Int>(noOfEulerNodes)

    /**
     * keep track of last index of a node appeared in euler tour. e.g. if
     * tour produces [0, 1, 2, 0, 1], the index of node 1 stored in this
     * map will be 4.
     */
    val lastIndexOfEulerNode = mutableMapOf<Int, Int>()

    fun dfs(startNodeIndex: Int, level: Int) {

        // leftChildIndex = (startNodeIndex * 2) + 1
        val leftChildIndex = startNodeIndex.shl(1).plus(1)

        // rightChildIndex = (startNodeIndex * 2) + 2
        val rightChildIndex = startNodeIndex.shl(1).plus(2)

        tourNodesDepth[depthIndex++] = level
        eulerTour.add(tree[startNodeIndex])
        lastIndexOfEulerNode[tree[startNodeIndex]] = eulerIndex++

        if (tree.getOrNull(leftChildIndex) != null) {
            dfs(leftChildIndex, level + 1)
            eulerTour.add(tree[startNodeIndex])
            tourNodesDepth[depthIndex++] = level
            lastIndexOfEulerNode[tree[startNodeIndex]] = eulerIndex++
        }

        if (tree.getOrNull(rightChildIndex) != null) {
            dfs(rightChildIndex, level + 1)
            eulerTour.add(tree[startNodeIndex])
            tourNodesDepth[depthIndex++] = level
            lastIndexOfEulerNode[tree[startNodeIndex]] = eulerIndex++
        }
    }

    dfs(0, 0)
    return Result(
        tourNodesDepth = tourNodesDepth,
        eulerTour = eulerTour,
        lastIndexOfEulerNode = lastIndexOfEulerNode
    )
}

class SpareTable(private val arr: Array<Int>) {

    private val table: Array<Array<Int>>
    private val indexTable: Array<Array<Int>>

    init {
        val size = arr.size
        val rows = floor(log(size.toDouble(), 2.0)).toInt().plus(1)
        table = Array(rows) { Array(arr.size) { 0 } }
        indexTable = Array(rows) { Array(arr.size) { it } }
        System.arraycopy(arr, 0, table[0], 0, arr.size)
        for (row in 1 until rows) {
            var colum = 0
            while (colum.plus(1.shl(row)) <= size) {
                val left = table[row - 1][colum]
                val right = table[row - 1][colum.plus(1.shl(row - 1))]
                table[row][colum] = min(left, right)
                if (table[row][colum] == left) {
                    indexTable[row][colum] = indexTable[row - 1][colum]
                } else {
                    indexTable[row][colum] = indexTable[row - 1][colum.plus(1.shl(row - 1))]
                }
                colum++
            }
        }
    }

    fun queryMinIndex(a: Int, b: Int): Int {
        val left = min(a, b)
        val right = max(a, b)
        val i = floor(log((right.minus(left).plus(1)).toDouble(), 2.0)).toInt()
        val rightIndex = right.minus(1.shl(i)).plus(1)
        return if (table[i][left] < table[i][rightIndex]) {
            indexTable[i][left]
        } else {
            indexTable[i][rightIndex]
        }
    }
}

Conclusion

There are also several other algorithms for finding the LCA of two nodes in a tree, such as the binary lifting algorithm and the Tarjan's off-line lowest common ancestors algorithm. These algorithms have different time and space complexities, but they generally have a time complexity of O(log n) or O(1) after a pre-processing step.

Usage

The LCA problem is a fundamental problem in graph theory and has many applications, such as in computing the distance between nodes in a tree or in constructing efficient data structures for storing and querying trees.

One example of a data structure that uses the LCA problem is the binary indexed tree (BIT), also known as the Fenwick tree. A BIT is a data structure that allows efficient updates and queries of a dynamic array. The basic idea is to represent the array as a tree, where each node represents the sum of a range of values in the array. The LCA problem is used to compute the range of values that need to be updated or queried at each step, allowing the BIT to perform updates and queries in O(log n) time, where n is the size of the array.

Another example of a data structure that uses the LCA problem is the lowest common ancestor binary search tree (LCA-BST), also known as the level ancestor data structure. An LCA-BST is a binary search tree that is augmented with additional information to allow efficient LCA queries. Specifically, each node in the tree stores a pointer to its kth ancestor, where k is a power of two. The LCA problem is used to compute the kth ancestor of a node, allowing the LCA-BST to perform LCA queries in O(log n) time and space, where n is the number of nodes in the tree.

multixlab