Maximum sum of distances of a node to every other node

Shreya Deep
Last Updated: May 13, 2022

Introduction

In graph theory, the distance between two nodes is the shortest path from one node to the other.

In this article, we will learn how to find the maximum sum of distances of a node to every other node.

Problem Statement

You are given an undirected and connected tree, containing n nodes, from 0 to n-1, along with its edges, and your task is to find the maximum sum of distances of a node to every other node in the given tree.

For example:

Input

n = 6, edges = {{0,1}, {0,2}, {2,3}, {2,4}, {2,5}}

Output

12

Explanation:

The tree represented by the above edges is:

The Sum of distances of node 0 from all the other nodes is: (1 from 1, 1 from 2, 3 from 3, 3 from 4, and 3 from 5) 1+1+3+3+3 = 11.
Sum of distances of node 1, from all the other nodes are: (1 from 0, 2 from 2, 3 from 3,4, and 5) 1+2+3+3+3 = 12
Sum of distances of node 2, from all the other nodes are: (1 from 0,3,4, and 5, 2 from 1) 1+1+1+1+2 = 6
Sum of distances of node 3, from all the other nodes are: (1 from 2, 2 from 4,5, and 0, 3 from 1) 1+2+2+2+3 = 10
Sum of distances of node 4, from all the other nodes are: (1 from 2, 2 from 3,5, and 0, and 3 from 1) 1+2+2+2+3 = 10
Sum of distances of node 5, from all the other nodes are: (1 from 2, 2 from 3,4, and 0, and 3 from 1) 1+2+2+2+3 = 10. 

Thus, we can surely say that the maximum value of the sum of distances of a node to every other node is 12, which is the sum of distances of node two from all other nodes.

Note: In the inputs, the edges are given in a 2-D array called "edges," There is an edge between the nodes: edge[i][0] and edge[i][1]. 

Solution Approach

Approach 1: Using DFS

The simplest approach to solve this problem is to determine the sum of distances of each node from all the other nodes using DFS. So, run a DFS from a node and find the distance of all the other nodes from this one. You can find the distances by noticing that, in the DFS, whenever you go to a new depth level, the distance increases by 1. Also, keep adding the distances to the current sum, and in the end, update the answer by the maximum of the current value of the answer and the current sum value. 

Steps of implementation are:

  • Input the value of n and edges
     
  • Create the adjacency vector by pushing the nodes' edges into each other adjacency vectors
     
  • Declare a variable called "answer," which contains the final answer, and initialize it with -1
     
  • Run a for loop and keep calling dfs for each node
     
  • For each node, declare and initialize a variable "ans," which stores the sum of distances of the current node from all the other nodes.
     
  • Call the function dfs for each node. Initially, the level parameter value will be one as the node's direct children are on level 1 from the node.|
     
  • In the function dfs():
    • Loop through all the children of the current node, and for each child: The distance of a node from the root is equal to the level of the node from the root, so add the level into the answer. For each node, the level of its children from the root is one greater than its level from the root. so, for each child, call the dfs function after incrementing the value of the level by 1
       
  • Update the answer variable with the max of (answer, ans) as the answer variable needs to have a maximum value always
     
  • Print the answer

C++ implementation

#include<bits/stdc++.h>
using namespace std;


void dfs(int node,vector<vector<int>>&adj, int level, int parent, int& ans){
    for(auto child: adj[node]){
        if(child!=parent){
            /* 
                The distance of a node from the root is equal to the level of the node from the root, so add the level into the answer.
            */
            ans+=level;
            /*
                For each node, the level of its children from the root is one greater than its level from the root. so, for each child, call the dfs function after incrementing the value of the level by 1
            */
            dfs(child,adj,level+1,node,ans);
        }
    }
}


int main()
{
    int n;
    /* 
        Input the value of n and edges
    */
    n = 6; 
    vector<vector<int>>edges = {{0,1}, {0,2}, {2,3}, {2,4}, {2,5}};
    /*
        Create the adjacency vector by pushing the nodes' edges into each other adjacency vectors
    */
    vector<vector<int>>adj(n);
    for(auto x:edges){
        adj[x[0]].push_back(x[1]);
        adj[x[1]].push_back(x[0]);
    }
    /*
        Declare a variable called to answer, which contains the final answer, and initialize it with INT_MIN
    */
    int answer = INT_MIN;
    /* 
        Run a for loop and keep calling dfs for each node
    */
    for(int i=0;i<n;i++){ 
        /*
            For each node, declare and initialize a variable "ans," which stores the sum of distances of the current node from all the other nodes.
        */
        int ans=0;
        /*
            Call the function dfs for each node. Initially, the level parameter value will be one as the node's direct children are on level 1 from the node.
        */
        dfs(i,adj,1,-1,ans);
        /* update the answer variable with max of (answer, ans) as the answer variable needs to the maximum value always
        */
        answer = max(answer,ans);
    }
    /*
        Print the answer
    */
    cout<<answer<<endl;
    return 0;
}

Output

12

Complexities

Time complexity

O(n^2), where n is the number of nodes

Reason: Calculating the sum of distances of all the other nodes from the current node, take O(n) time because of DFS, and we're doing it for all the nodes one by one. So, since there are n nodes, the time complexity is O(n^2).

Space complexity

O(n), where n is the number of nodes

Reason: The only polynomial space taken here is by the recursion stack of the DFS function, and that space is O(n). Therefore, the total space complexity is O(n).

Approach 2: Using the re-rooting technique

For understanding this approach, let's try to solve a simpler problem first. The problem is that you only need to find the sum of distances for only the root node.

Suppose our tree is this.

Suppose we fix the root to 0. Let's store the answer of the nodes in an array, ans[n], where ans[i] denotes the distance between node i and all the nodes in its subtree. Now, how to find the value of ans[i]?

You can see that, ans[2] = 2 and ans[1] =1

Also, for root 0, you can say that ans[0] = 8. You can also see ans[0] as, ans[0] = ans[1]+2+ans[2]+3, i.e., ans[0] = ans[1]+ number of nodes in the subtree of 1(including 1)+ ans[2] + number of nodes in the subtree of 2(including 2). 

The question that must have arisen in your mind is why are we even writing it in this form

To answer this, think of how the ans[i] depends on the children of i. You can notice that the distance of a grandchild of the root is one greater than its distance from the child (i.e., the parent of the grandchild). So, if you've got the distance of all the nodes in the subtree of the child from a child, you can find the distances of that node from the parent of the child by just adding 1 to the earlier found distance. Also, notice that you'll have to add 1 to this answer because the child's distance is exactly one from its parent. So, we can say that, ans[parent] = ans[child] + 1* (number of nodes in the subtree of child - 1) +1 = ans[child]+size of subtree[child]

So, for solving this problem, you need to maintain two arrays, ans[i] and subtree_size[i]. And, the pseudo code for the above discussed solution will be: 

void dfs(int node, int parent = -1) {

    for(auto child : adj[node]) {

        if(child == parent) continue;

        dfs(child, node);

        subtree_size[node] += subtree_size[child];

            ans[node] += ans[child] + subtree_size[child];

    }

    subtree_size[i] += 1;

}

Now, let's come back to our original problem. You can now notice that the original problem requires you to do the above task by making each node the root of the tree one by one. But doing this will again give us O(n^2) time complexity. So, how do we optimize it? Let's see.

To optimize it, we will use a "re-rooting technique" technique. 

Now, for the above tree example, once we've calculated the distances from the root 0, let's shift the root to node 1. 

What changes do we observe?

We see that, all the nodes in the subtree of 1, goes one unit closer to the new root, and all the other nodes go 1 unit farther from the new root. So, we can say, ans[1] = ans[0] - subtree_size[1] + n - subtree_size[1]. So, the problem is solved! We have a relation as, ans[child] = ans[parent] - subtree_size[child] + n - subtree_size[child].

To summarize the solution in two lines,

  • First, fix a root, run a DFS, and find the sum of the distances of each node from this root. For this, use the relation, ans[parent] = (for all children) ans[child]+size of subtree[child]
  • Now, run another DFS, and find the sum of distances of a node from all the other nodes using the relation, ans[child] = ans[parent] - subtree_size[child] + n - subtree_size[child]

Steps of implementation are:

  • Declare three vectors, adj, subtree_size, and ans, which store the adjacency vector, subtree sizes, and answer for each node, respectively.
     
  • Input the value of n and edges
     
  • Resize the vectors adj, ans, and subtree size to have a size = n
     
  • Create the adjacency vector by pushing the node’s neighbours into their adjacency vectors
     
  • Declare a variable called answer, which contains the final answer, and initialize it with INT_MIN
     
  • Call the functions dfs in which we fix node 0 as the tree's root, calculate the answer for root, and calculate the subtree sizes of all the nodes.
     
  • In the function dfs:
    • For each child of a node, call the dfs function with the current child as the node and the current node as the parent. After you've found out the subtree size and ans of the child, update the value of ans and subtree size of the current node
       
  • And then call dfs2, which finds the answers of the ith node in the vector "ans”.
     
  • In the function dfs2:
    • For each child of a node, update the value of ans[child] using the ans[node] and the subtree size calculated earlier. Call the function dfs2 with the current child as the node and the current node as the parent.
       
  • Iterate over all the ans[i] values for i=0 to i=n and update the answer value to the maximum of ans[i] and answer.
     
  • Print the answer

C++ implementation

#include<bits/stdc++.h>
using namespace std;


vector<vector<int>>adj;
vector<int>subtree_size;
vector<int>ans;


void dfs(int node, int parent) {
    /* 
        For each child of a node
    */
    for(auto child : adj[node]) {
        if(child == parent) continue;
        /* 
            Call the dfs function with the current child and the current node as the parent.
        */
        dfs(child, node);
        /* 
            After you've found out the subtree size and ans of the child, update the value of ans and subtree size of the current node.
        */
        subtree_size[node] += subtree_size[child];
        ans[node] += ans[child] + subtree_size[child];
    }
    subtree_size[node] += 1;
}


void dfs2(int node,int n, int parent){
    /* 
        For each child of a node
    */
    for(auto child: adj[node]){
        if(child!=parent){
        /* 
            Update the value of ans[child] using the ans[node] and the subtree size calculated earlier
        */
            ans[child] = ans[node]-subtree_size[child]+n-subtree_size[child];
        /* 
            call the function dfs2 with the current child as the node and the current node as the parent
        */
            dfs2(child,n,node);
        }
    }
}


int main()
{
    int n;
    /* 
        Input the value of n and edges
    */
    n = 6; 
    vector<vector<int>>edges = {{0,1}, {0,2}, {2,3}, {2,4}, {2,5}};
    /* 
        Resize the vectors adj, ans, and subtree size to size = n
    */
    adj.resize(n);
    ans.resize(n);
    subtree_size.resize(n);
    /*
        Create the adjacency vector by pushing the nodes' edges into each other adjacency vectors
    */
    for(auto x:edges){
        adj[x[0]].push_back(x[1]);
        adj[x[1]].push_back(x[0]);
    }
    /*
        Declare a variable called "answer," which contains the final answer and initialize it with INT_MIN
    */
    int answer = INT_MIN;
    /*
        Call the functions dfs in which we fix node 0 as the root of the tree, calculate the answer for root, and calculate the subtree sizes of all the nodes.
        And then call dfs2, which fills the answers of the ith node in the vector "ans."
    */
    
    dfs(0,-1);
    dfs2(0,n,-1);
    /*
        Iterate all the ans[i] values for i from 0 to n and keep updating the answer values to the maximum of ans[i] and answer.
    */
    for(int i=0;i<n;i++){
        answer =max(answer,ans[i]);
    }
    /*
        Print the answer
    */
    cout<<answer<<endl;
    return 0;
}

Output

12

Complexities

Time complexity

O(n), where n is the number of nodes.

Reason: We are running two DFSs separately, and they take O(n) time. Thus, the time complexity is O(n).

Space complexity

O(n), where n is the number of nodes.

Reason: The space taken up by the ans[i] vector is O(n) as it is of n size. All the other spaces are constant. Thus, the space complexity is O(n). 

Frequently asked questions

  1. What is a tree?
    A tree is a hierarchical data structure, where each node has some other nodes connected to it called its children through edges. In a tree, there is no cycle.
     
  2. What is the difference between a graph and a tree?
    The main difference between a graph and a tree is that a graph contains cycles, whereas a tree doesn't.
     
  3. What is Depth First Search?
    DFS, or Depth-first Search, is a traversal algorithm in which we travel a given tree or graph depth-wise.
     
  4. What is the re-rooting technique?
    In the re-rooting technique, we try to solve a given problem by changing the root one by one with any tree node.

Key Takeaways

In this article, we discussed the problem of finding the maximum sum of distances of a node to every other node. An essential algorithm used here was the DFS algorithm. This is a very important algorithm and is used to solve a lot of problems. I suggest you solve more problems on this topic. Some of these are roots of the tree having minimum heightcolor the graphjumping numberslargest cycle, and possible bipartition

Are you planning to ace the interviews of reputed product-based companies like Amazon, Google, Microsoft, and more?

Attempt our Online Mock Test Series on CodeStudio now!

Happy Coding!

Was this article helpful ?
0 upvotes