Optimal BST
Given a sorted array of keys of BST and an array of frequency counts of each key in the same order as in the given inorder traversal. The frequency of an element is the number of searches made to that element. Construct a binary search tree from the given keys such that the total cost of all the searches is minimum.
Cost of searching a key is its frequency multiplied by its level number in the BST.
A binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree whose internal nodes each store a key greater than all the keys in the node's left subtree and less than those in its right subtree.
For example:
Input keys = [ 1, 3, 5 ]
Frequency = [ 3, 10, 7 ]
All the unique BST possible from the given keys are:
Among all these possible BST, the minimum cost is obtained from the below BST:
Cost = 1 * (freq of 3) + 2 * (freq of 1) + 2 * (freq of 5) = 30
where 1 is the level of node 3, 2 is the level of the node 1 and node 5.
Input Format
The first line of input contains an integer T, the number of test cases.
The first line of each test case contains an element ‘N’ denoting the number of elements in the BST.
The second line of each test case contains ‘N’ space separated integers, sorted in ascending order.
The third line of each test case contains ‘N’ space separated integers denoting the frequency of each element of the BST.
Output Format:
For every test case, print the minimum total cost of constructing the BST.
The output of each test case is printed in a separate line.
Note
1. Given BST will not contain duplicates.
2. You don’t have to print anything, it has already been taken care of. Just implement the function.
Constraints:
1 <= T <= 5
2 <= N <= 50
0 <= data <= 10^4
Where ‘T’ is the number of test cases, ‘N’ is the number of nodes in the given tree and ‘data’ denotes the value of the nodes in the given tree.
This problem can be solved by solving its subproblems and then combining the solutions of the solved subproblems to solve the original problem. We will do this using recursion.
The idea is very simple. We will consider each key as the root and find an optimal solution by recursively finding the cost of left and right subtree. Then we will add the cost of the left and right subtree to the current’s node cost.
Cost of a node is the product of its frequency and its level.
Algorithm:
- Create a helper function ‘optimalCostHelper’ to find the minimum cost to construct a bst.
- Call this helper function by passing the keys, frequencies, starting and the ending index of the keys in consideration, and level/depth of the current root.
Algorithm for optimalCostHelper(keys, frequencies, startIdx, endIdx, curLevel):
- If we have checked for all the nodes, return 0.
- Initialise an integer variable ‘minimumCost’ by the maximum positive value.
- Run a loop from startIdx to endIdx:
- Consider the current node as the root node.
- Recursively find the optimal cost of left and right subtree. The level of a child node is 1 + the level of the parent node i.e 1 + curLevel.
- Now, current’s node cost is the sum of product of its frequency and level, optimal cost for left subtree and optimal cost for right subtree.
- If this cost is less than the ‘minimumCost’, update ‘minimumCost’.
- Return the minimum cost to construct the binary search tree.
We are solving this problem by solving its subproblems and then combining the solutions of those subproblems. If we analyze carefully, we will see that we are solving the same subproblems multiple times. Thus, this problem exhibits overlapping subproblems. Thus, in this approach, we will eliminate the need for solving the same subproblems again and again.
Lets understand by an example:
Suppose we have keys = [ 1, 2, 3, 4 ] and frequencies = [ 2, 5, 1, 7 ]
The below figure shows some of the recursive calls made for the given problem.
As we can see, some subproblems are called multiple times.
Subproblems that are called again are shown in the box.
That’s why we are storing the solved subproblems in a lookup table so that we don’t have to solve them again. Whenever a call is made to the solved subproblem, we will directly return the answer of that subproblem stored in the lookup table.
Algorithm:
- Create a helper function ‘optimalCostHelper’ to find the minimum cost to construct a bst.
- Use a 3 - D matrix ‘cost’ of size (N + 1) * (N + 1) * (N + 1), to store the answer of subproblems. Initialise the matrix by -1.
- The value cost[i][j][k] denotes the minimum cost required to construct BST from keys between index i to j and the root node lies at level ‘k’.
- Call this helper function by passing the keys, frequencies, starting and the ending index of the keys in consideration, level/depth of the current root, and the ‘cost’ matrix.
Algorithm for optimalCostHelper(keys, frequencies, startIdx, endIdx, curLevel, cost):
- If we have checked for all the nodes, return 0.
- If the current subproblem has already been solved, return the answer stored in the lookup table ‘cost’.
Means, if cost[startIdx][endIdx][curLevel] != -1, return its value.
- Initialise an integer variable ‘minimumCost’ by the maximum positive value.
- Run a loop from startIdx to endIdx:
- Consider the current node as the root node.
- Recursively find the optimal cost of left and right subtree. The level of a child node is 1 + the level of the parent node i.e 1 + curLevel.
- Now, current’s node cost is the sum of product of its frequency and level, optimal cost for left subtree and optimal cost for right subtree.
- If this cost is less than the ‘minimumCost’, update ‘minimumCost’.
- Store this minimumCost in the lookup table ‘cost’ and return it.
Let’s look at the recursive formula for this problem. The total optimal cost for searching in the BST formed with the given keys can be shown as:
According to the above formula, we will try all nodes as the root node(root varies from i to j in second term). When we make the kth node as the root, we will recursively calculate optimal cost for the BST formed from the keys[i to k-1] and keys[k+1 to j].
We add the sum of frequencies from i to j (the first term in the above formula), this is added because every search will go through root and one comparison will be done for every search.
Let’s walk through the steps:
- The idea is to create a 2-D array ‘cost’ of size (N + 1) * (N + 1).
- Initially, all the elements of the ‘cost’ matrix will be the maximum positive number.
- Now, the value cost[i][j] denotes the minimum cost required to construct BST from keys between index i to j.
- If we have only one key, cost will be equal to the frequency of that key.
- The detailed algorithm to fill the ‘cost’ matrix is as follows:
- L1 : Run a loop from size = 2 to N to try all the possible lengths of the given keys. The current length of the sequence is min( N - 1, N - size + 1).
- L2 : Run a loop from 0 to N - size + 1 to select the starting index of the sequence.
- L3 : Run a loop to consider each key as the root node of the BST.
- L4 : Run a loop to find the cost of the current node.
- Add the optimal cost of the left subtree and the right subtree to the cost of the current node.
- Update the ‘cost’ matrix if needed.
Note that L1, L2, L3, and L4 will be four nested loops.
- Return cost[0][n-1], the final minimum cost of constructing BST from keys between index 0 to n - 1 (including).
In the previous approach, we were calculating the sum of frequencies for every current node. This step resulted in an additional time consumption of the order O(N), resulting in the time complexity of O((N ^ 3) * N) = O(N ^ 4).
The final time complexity can be reduced from O(N ^ 4) to O(N ^ 3), by precomputing the sum of frequencies.