Directed Graph or Strongly Connected Components

Directed Graph or Strongly Connected Components
Directed Graph or Strongly Connected Components

A directed Graph is said to be strongly connected if there is a path between all pairs of vertices in some subset of vertices of the graph.

In simple words, it is based on the idea that if one vertex u is reachable from vertex v then vice versa must also hold in a directed graph. This strong connectivity is applicable for directed graphs only.

A naive approach of checking this property would be to look for whether we can traverse all vertices from every other vertex or not and such a method would lead to an tine complexity of O(V3), where V is the number of vertices. There are two efficient ways of finding strongly connected components in a graph in linear time complexity.

Kosaraju’s Algorithm

Kosaraju’s algorithm is just a DFS approach based technique with a linear time complexity of O(V+E). But before jumping straight into the algorithm let us first define Condensed Component Graph as a graph with <=V nodes and <=E edges and in which every node is Strongly Connected Component. It can be proved that a condensed component graph is directed acyclic graph. Let’s prove it by contradiction and assume that it is not directed acyclic.

Now let’s observe that on the cycle, every strongly connected component can reach every other strongly connected component via a directed path, which in turn means that every node on the cycle can reach every other node in the cycle, because in a strongly connected component every node can be reached from any other node of the component. So if there is a cycle, the cycle can be replaced with a single node because all the Strongly Connected Components on that cycle will form one Strongly Connected Component. Therefore, the Condensed Component Graph will be a DAG.

Observe that if a DFS is done from any node in the Sink, only nodes in the Strongly Connected Component of Sink are visited. Now, removing the sink also results in a DAG, with maybe another sink.

So the above process can be repeated until all Strongly Connected Component’s are discovered. So at each step any node of Sink should be known. This should be done efficiently. Thus algorithm can be pointed out as follows:

  • Do a DFS traversal of original graph and keep track of finish time of every node. Sort the nodes according to finishing time in descending order. To perform this we can use stack. This way the node with highest finishing time will be on top of the stack.
  • Reverse the original graph.
  • Start another DFS traversal with the source vertex being the vertex on top of the stack. Pop vertices from top of the stack until a valid unvisited vertex are found. These steps are repeated until all nodes are visited. When this DFS call finishes all nodes that are a part of strongly connected component will be visited.

A C++ snippet of the code of the three steps described above is given below:

The following method should be called in main before step 3.

Tarjan’s Algorithm

Tarjan’s Algorithm is another linear time algorithm to find Strongly Connected Components (SCC) .It is based on the fact that a DFS search produces a DFS tree and SCC are just sub trees of the DFS tree. If we are able to find the head of such subtree we can print all nodes that lie in that subtree. Let us introduce DFS tree and the concept of discovery and low time before jumping into the implementation. DFS algorithm on a graph will give us the set of nodes with no cycles and all such nodes are part of DFS tree.

We can start from any vertex and find the DFS tree of any graph. We keep a timer variable to keep track of time for each node denoting when the node was visited for the first time. This is known as discovery time. Low time or Low value of a node denotes topmost reachable ancestor via the subtree rooted at this node. So for any node low time is initially equal to its discovery time as every node is its own ancestor.

Then we look into the subtree and check if there is some node that can take us to any of its ancestor and edge that connects node to one of its ancestor is called a back edge. Hence when we traverse the graph we do the following for forward and backward edges.

Let us start a DFS call for node u and visit its children v. Two cases arise as mentioned below.

  • If node v is not visited then after DFS for node v is completed we update low[u] = min(low[u] , low[v]) à Case of Forward Edge
  • If node v is already visited then low[u] = min(low[u], discovery[v]).à Case of Back edge.

This concept of low time, discovery time, DFS tree is also useful in finding bridges or articulations points in a graph. To track subtree rooted at head we can use stack. When head is found pop all nodes from stack until head is popped out.

It makes sense to study strong connectivity in graph models where directed paths and reachability play important roles. A very simple example of such an application would be a city graph model where vertices are street intersections and edges are (possibly one-way) streets.

A C++ snippet of code is given below:

To read on more topics, click here.

By Mridul Kumar