Learning Disjoint Set Union (Union Find)

Learning Disjoint Set Union (Union Find)
Learning Disjoint Set Union (Union Find)

Disjoint Set Union (DSU) is one such data structure. It is also referred to as Union Find because of the functionalities it provides.

The efficiency of a data structure depends on efficiently it handles the query of some problem statement. A good choice of data structure can reduce the execution time and that is what comes handy in real-life problems. We want our queries to be resolved quickly. The basic functionalities of this set include:

  • union_sets (a,b) – Merge sets in which elements a,b are present
  • find_set (a) – Return the set to which element a belongs

One question that is very obvious is that how do we store sets? Well, we store sets in the form of trees. Each Tree will correspond to one set and the root of the tree will be the representative/leader of the set.

Let us consider we have 4 elements numbered from 1 to 4. At the beginning every element starts as its own set; therefore each vertex is its own tree. Now we start combining or start doing union of these elements under some condition and we get the following two sets.

We have combined elements 1 ad 2 into one set and elements 3 and 4 into another set. This also tells us about the implementation idea that we have to maintain a parent that stores reference to its immediate ancestor in tree. Let us first look at the naïve implementation and then we further optimise the algorithm by introducing path compression and union by rank techniques.

For the Naïve implementation, we will implement three functions namely make_set, find_set, union_sets. As already mentioned all the information about elements will be kept in an array name parent and at the beginning, each element is its own set. To combine two sets we first find the representative of the set in which element A is located and the representative of the set in which element B is located. If they belong to the same set then it means the sets are already merged.

Else, we can simply specify that one of the elements is the parent of the other element thereby combining the two trees. Finally the implementation of the find function: we recursively call for the parent of the current node until the parent and the node becomes same.

This implementation of these functions gives us a time complexity of O (LogN) on average and O(N) in the worst case (case when the tree is skewed and can become a linked list). Now we will consider a further optimised approach which will improve the complexity even further.

Union by rank and size

In the Naïve implementation the second tree was always attached to the first one. In this implementation we merge the set which is greater in size with the set smaller in size if the union_sets function is called. Most popular approaches for doing this are union by rank and union by size. Both of these are similar in implementation and time complexity. In Union by rank the idea is to attach smaller depth tree under the root of the deeper tree. Instead of rank size can also be used.

Path Compression

In this technique for some vertex V we find the representative P for all vertices between path P and V. In doing so we actually reduce the path by making parent of all such vertices as P. Let us observe this technique from the following image.

In the above image we see that when find_set (7) is called we connect all such nodes that come in the path between node 7 and node 1 to node 1.We make the parent of all such nodes equal to parent of node 7.

If we combine both of these optimisations techniques the time complexity will reduced to nearly O (1).But this is amortised time complexity (total time per operation). DSU with union by size/rank but without path compression works in O (LogN) complexity per query. Also when using path compression rank word is used instead of height as rank is not always equal to height.

Let us look at the implementation of the above methods:

Applications of DSU:

  • Finding Connected Components in an undirected  graph
  • Finding Cycle in an Undirected Graph
  • Store Additional Information for each set
  • Offline Range Minimum Query (RMQ)

Let us find Cycle in an Undirected Graph. For this purpose, we will define a class graph and set the number of vertices in the constructor and for edges, we will consider a list of pairs. Here we are considering an undirected graph with no self-loops. For each edge, make subsets using both the vertices of the edge. If both the vertices are in the same subset, a cycle is formed.

The time complexity of this code will be O(N).

To read more about Data Structures, click here and to explore programming courses, visit Coding Ninjas.

By Mridul Kumar