Explained: Disjoint-Set Data Structure

Disjoint-Set Data Structure
Disjoint-Set Data Structure

We will discuss important mathematics concept sets first to represent equivalence sets of elements where the order doesn’t matter. The Disjoint sets ADT is the most used data structure for graph problems.

Let us first discuss about sets and their equivalence relations:

 A set can be simply implemented using an array and the equivalence of their subsets can be found. Let us assume we have a set S of various elements. Now for every element a, b ϵ S, we define some relations over these elements. We say an R b is true then a is related to b and vice versa. So in such cases, we say a relation is equivalent if it follows three rules:

  • Reflexive: If a ϵ S then for every element a, an R a is true
  • Symmetric: If an R b then b R a where a, b ϵ S
  • Transitive: If an R b and b R c then an R c where a, b, c ϵ S

The equivalence class of any element a ϵ S creates a partition of S in which each equivalent class intersection is a null set hence these are referred to as disjoint sets. There are few operations of this ADT –

  • Creating an equivalence class
  • Finding the respective equivalence class (Find)
  • Combining the equivalence classes (Union)

Disjoint Set Implementation

This involves three methods:

  • MAKESET(X) (creates a set with element X)
  • FIND(X) (find the name of the equivalence set which contains X)
  • UNION(X,Y) (Union the two sets X,Y to create a new set containing all the elements X,Y and deleting the previous sets of X,Y

We know that for any two sets Si & Sj, Si ꓵ Sj = Ф, hence disjoint. We apply the Union operations on the elements of these sets may be a, b and if we FIND a and b and if their equivalence sets are different then they are not related.

Fast FIND implementation (Quick FIND)

In this method, we use an array for the representation of the elements and their set names. As there can be many elements, we take the set names as integers itself. Example let us assume an array A[n] where A[i] denotes the set name where I belong to. This method will take O (1) time for accessing the array elements.

But using this as the FIND function, the UNION function will take O(n2) time. For two elements a,b we first FIND(a) and FIND(b) i.e. their parents. Now if their parents are different then we have to change either set a’s elements to point to b’s parents or vice versa hence for every different parent we get we have to change all the elements in their sets which will take O(n) time and hence the overall complexity will become O(n2).

So what’s the solution to this problem?

We optimise the UNION function at the cost of FIND.

Fast UNION implementation (Slow FIND)

The idea is to use a parent array along with the root/parent of a set which points to itself. Let us define the functions we have discussed earlier:

  • MAKESET(X): Creates a new set containing the single element X and, in the array, update the parent of X as X(root)
  • UNION (X, Y): Replaces the two sets containing X and Y by a single set which contains both of them
  • FIND(X): returns the name of the set that contains X. We keep on searching the X’s set until we find the root of that set

Initially we will create n set containing n single elements whose parents are themselves.

Here Union operation is just change the parent of one set to point to the root of another set hence the complexity is O(1). The FIND operation has the complexity as the depth of X in the tree.

Fast UNION implementation (Quick FIND)

In the previous method, we are getting to FIND complexity as O(n) in the worst case in a skew tree. So we need to optimise that method. For this, we use two ways:

  • UNION by size(Union by weight) making the smaller tree a subtree of a larger tree.
  • UNION by height(Union by rank) make the tree with less height a subtree of the tree with more height.

UNION by size refers to the same procedure as previous only this time we will mark every element’s parents as -1 initially, -1 will mean they are their own parent initially. We will increment the size of the parent once after the UNION of two sets. Let’s take an example:

Initial Arrangement:

Now let suppose 1 and 2 are to be UNION then parent[1] = -2, parent[2] = 1, so the root element will contain the size of the set too.

Again for input of UNION 2 and 3:

Hence, we can see that we are directly getting the root along with their size i.e. the parent of a set and its size.

UNION by height (UNION by rank)

We store negative of height of the tree (that means, if the height of the tree is 2 then we store -2 in parent array). The height of a tree with 1 element is 1.

UNION by height implementation

For FIND operation no change in implementation.

UNION FIND using Path Compression

FIND operation traverses a list of nodes on the way to the root. We can make later FIND operations efficient by making each of these vertices point directly to root. This is called path compression. Lets see an example of this form.

Before FIND (X)

After FIND (X)

With path compression the only change to the function id that parent[X] is made equal to the value returned by FIND. That means after the root of the set is found recursively, X is made to point directly to it. This happen recursively to every node on the path to root.
FIND with path compression implementation

Path compression is compatible with UNION by size but not with UNION by height as there is no efficient way to change the height of the tree.

Now let us look at a easy to use version of UNION & FIND operation:
int FIND(int a){
if(a == parent[a])
return a;
return FIND(parent[a]);
}
Void UNION(int a,int b){
If(FIND(a) == parent(b))
Return;
Parent[a] = b;
}
Applications of UNION FIND algorithm
• Cycle Detection
• Job Sequencing
Kruskal’s Minimum Spanning Tree Algorithm
• In game theory algorithms
• Graph Algorithms
The above code is a simpler version of UNION finds to help understand the basic concept. Hope this is helpful to any aspiring.

To read more about data structures, read here.

By Aniruddha Guin