Strongly Connected Components (Tarjan’s Algorithm)

Posted: 26 Dec, 2020
Difficulty: Hard

PROBLEM STATEMENT

Try Problem

You are given an unweighted directed graph of 'V' vertices and 'E' edges. Your task is to print all the strongly connected components (SCCs) present in the graph.

Input format:
The very first line of input contains an integer ‘T’ denoting the number of test cases.

The first line of every test case contains two space-separated integers ‘V’ and ‘E’ denoting the number of vertices and the number of edges present in the graph. 

The next ‘E’ lines contain two space-separated integers ‘a’ and ‘b’ denoting a directed edge from vertex ‘a’ to ‘b’.
Note:
Use zero-based indexing for the vertices.
Output format:
For each test case, print space-separated vertices present in the strongly connected components of the graph, print the output for one SCC on each line.

The order of sequence does not matter.
Note:
You do not need to print anything, it has already been taken care of. Just implement the given function.
Constraints:
1 <= 'T' <= 300 
1 <= 'V' <= 2000
'V' - 1 <= 'E' <= 2000
0 <= 'a, b' < 'V'

Time Limit: 1 sec
Approach 1
  • Tarjan’s algorithm can determine the SCCs using just a single DFS. Before understanding Tarjan’s algorithm we must understand the meaning of low-link value and discovery time of a node.
  • The discovery time of a node is the time at which the node was discovered during the DFS. This can be represented by an integer value which also specifies the order in which the nodes are visited during the DFS.
  • The low-link value of a node is the smallest (lowest) discovery time reachable from that node during the DFS, including the node itself.
  • For a SCC in a graph, every node which is a part of the SCC will have the same low-link value. But this may not be true all the time.
  • As the discovery time of a node depends on the order in which the graph is traversed in a DFS, hence it may be possible that multiple nodes may have the same low-link value even though they are a part of different SCC.
  • This is where Tarjan’s SCC algorithm comes in. The algorithm maintains an invariant which prevents the low-link values of two SCCs from interfering with each other.
  • The invariant used is a stack, which is different from the stack being used for DFS. The stack stores the nodes forming a subtree from which the SCC will be found. Nodes are placed in the stack in the order in which they are visited. But the nodes are removed only when a complete SCC is found.

The algorithm for the approach is as follows:

findSCC(vector<vector<int>> graph):

  1. Create two arrays - low and disc, to store the low-link value and discovery time of the nodes, respectively.
  2. Initialise all the entries of low and disc to -1. As no node has been visited initially.
  3. Create an empty stack, which will be used as an invariant (as explained in the approach).
  4. Create a global variable time, which will be used to mark the discovery time as the nodes are visited in DFS.
  5. Call the DFS procedure for all the unvisited nodes of the graph.
  6. Return all the strongly connected components found.

DFS(node u, low[ ], disc[ ], stack):

  1. Initialize the low-link value and discovery time of the current node with time. Also, increment the time.
  2. Push the node u in the (invariant) stack.
  3. Loop: For each node v, which is adjacent to the node u:
    • Case 1: Node v has not been explored till now. Therefore edge u->v is a tree edge in the DFS tree. So,
      • Recursively call DFS on node v.
      • Then update the low-link value of node u as: low[u] = min(low[u], low[v]).
    • Case 2: Node v has already been visited using the DFS. Therefore edge u->v is a back edge in the DFS tree. So,
      • Check whether the node v is present in the (invariant) stack or not.
      • If node v is present, then update the low-link value of node u as: low[u] = min(low[u], disc[v]).
  4. Check whether the current node u is the head/root of a SCC, i.e. If low[u] = disc[u]:
    • We have found a new SCC.
    • Pop all the elements from the stack until the node u gets popped.
    • Add every element to a list.
    • Add this list to the list of SCCs.

 

Let us understand this by an example:

  • Consider a graph as shown below. Initially, our (invariant) stack is empty. The green node marks the current node in the DFS traversal.
  • Starting the DFS traversal from the previously marked green node. We move along the edges marked by orange colour. As we traverse the graph, nodes are pushed into the stack in the order in which they are visited. The discovery time and the low-link value of any node u is shown in the image as <disc[u], low[u]>.
  • During the exploration of node 4, we find that the node has a back edge to node 3, Hence we update the low-link value of node 4 as low[4] = min(low[4], disc[3]) = min(4, 3) = 3.
  • Now the node 4 is completely explored (hence it is now marked orange), but it is not removed from the stack. We move back to node 3.
  • The exploration of node 3 is completed as all its adjacent nodes have been visited (so its marked orange).
  • According to step 4 in the DFS algorithm. We find that low[3] = disc[3], this indicates that node 3 is head of a SCC.
  • So we pop all the elements from the stack until node 3 gets popped. These form a SCC of the graph. This SCC is highlighted using a box as shown in the image.
  • We move back to node 2 to explore its remaining neighbours.
  • There is only one unvisited neighbour of node 2, its discovery time is set to 5. As the node 5 has a back edge to node 1, also node 1 is present in the stack. Hence we update the low-link value of node 5 as low[5] = min(low[5], disc[1]) = min(5, 1) = 1.
  • Now node 5 have been explored, but it is not removed from the stack. Now we move back to node 2.
  • As the neighbour node 5 is completely explored, we can now update the low value of node 2 (according to step 3, CASE 1) as low[2] = min(low[2], low[5]) = min(2, 1) = 1.
  • Now the exploration of node 2 is completed, but it is not removed from the stack.
  • We move back to node 1. According to step 4 in the DFS algorithm. We find that low[1] = disc[1], this indicates that node 1 is head of a SCC.
  • So, we pop all the elements from the stack until node 1 gets popped. These form a SCC of the graph. This SCC is highlighted using a blue box as shown in the image.
  • The DFS of the graph completes.
  • There are two SCCs in the graph: {1, 2, 5} and {3, 4}.
Try Problem