Maximum Disjoint Intervals

aniket verma
Last Updated: May 13, 2022

Introduction

In this blog, we will discuss how to find the Maximum Disjoint Intervals. It is an important problem to get a strong grip over greedy algorithms and has also been asked in many interviews.

A common prerequisite is having a basic knowledge of sorting, dp, and how comparators are written. Without knowing this, it is recommended to learn sorting, dp, and comparators. Nevertheless, we’ll try to cover each point in-depth that is required to find the Maximum Disjoint Intervals.

 

What do we mean by the Disjoint Intervals?

Well, it’s a trivial question, but still, for the sake of clarity, we’ll define that let

Two intervals [a, b], and  [c, d] where a <= b and c <= d are said to be disjoint if they do not have any point in common. In the mathematical form, we can write as:

Two intervals [a, b] and [c, d] are said to be disjoint if,,→ a <= b < c <= d  Or, →  [a, b] ⋂ [c, d] = 𝟇


Let’s look at the problem statement:

You are given a set of intervals [ai, biof length N, where i 𝞊 { 1, 2, ….. N }. Now you have to find the maximum length of the set of disjoint intervals.

Let’s understand the problem using an example.


Given a set of intervals,

Input:  N and a  set of intervals of length N, 

           Intervals = {[1, 8], [2, 5], [7, 8]}

Output:  2

Explanation: The maximum set of disjoint intervals is 2 as we see that [2, 5], [7, 8] form the maximum set of disjoint intervals.

Approach

First, let’s look at the naive approach that would probably strike anybody’s mind before jumping to an optimal approach.

So, the basic approach considers all subsets of the given set of intervals and then finds the maximum length of the subset of disjoint intervals using the definition mentioned earlier. 

So the naive/basic approach could be formulated as:

  1. For each subset in the given set of intervals, check if the subset contains only disjoint intervals.
  2. If the subset contains only disjoint intervals then maximize the answer.

 

Now let’s look at the PseudoCode.

PseudoCode

Algorithm

___________________________________________________________________
procedure MaximumDisjointIntervals(intervals):
___________________________________________________________________
1.    subsets ← computeSubsets(intervals) #compute all subsets of intervals
2.    maxLength ← 0                       # initially set to 0 in case the intervals size is 0
3.    for subset in subsets do      
4.    if containsDisjointIntervals(subset) == True do      
5.     maxLength ← max(maxLength, subset.size())
6.     end if
7.    return maxLength 
8.    end procedure
___________________________________________________________________

CODE IN C++

//C++ program to find the maximum length of disjoint set

#include <bits/stdc++.h>

using namespace std;

 

vector<vector<vector<int>>> computeSubsets(vector<vector<int>> & intervals, int N){

    vector<vector<vector<int>>> subsets;

    // we know that subsets can be represented using an integer

    // and there are at most 1<<N subsets

    // hence iterating over all integers from 0 to 1<<N

    for(int i=0; i<(1<<N); ++i){

        vector<vector<int>> v; // current subset

        int j = 0; // variable to check if the bit at a position is set in i

        while((1<<j)<=i){    

            if(((1<<j)&i)>0){ // if bit at current position is set in both i and j

                v.push_back(intervals[j]); // push it in the current subset

            }

            j++;      // shift to the next bit

        }

        if(v.size()>0)      // if the subset is not empty push it

            subsets.push_back(v); // in the subsets vector

    }

    // return the subsets

    return subsets;

}

 

// boolean function that finds whether the set 

// of intervals contain the set of disjoint intervals 

bool containsDisjointIntervals(vector<vector<int>> subset){

    sort(subset.begin(), subset.end());

    for(int i=0;i<subset.size()-1;++i){

        // if intersection is not null

        if(subset[i][1]>=subset[i+1][0] or subset[i][1]>=subset[i+1][0]){

            return false; // this subset doesn't contain disjoint intervals

        }

    }

    return true;

}

 

// finds the maximum length of disjoint interval set

int maximumDisjointIntervals(vector<vector<int>> & intervals, int N){

    int maxLength = 0; // maximum length of the disjoint intervals set

    

    // subsets

    vector<vector<vector<int>>> subsets = computeSubsets(intervals, N);

    // iterate over all subsets 

    for(vector<vector<int>> subset : subsets){

        // if it contains joint Intervals

        if(containsDisjointIntervals(subset)){

            //maximise the length

            maxLength = max(maxLength, (int)subset.size());

        }

    }

    // return maxi_length

    return maxLength;

}

 

int main() {

    

    // number of intervals

    int N = 3;

 

    vector<vector<int>> v{{1, 8}, {7, 8}, {2, 5}};

    

    cout<<"The maximum length of disjoint interval set is: ";

    cout<<maximumDisjointIntervals(v, N);

    return 0;

}

Output

The maximum length of disjoint interval set is: 2

Time Complexity: O((N*logN) *2N)

This is because we are computing all the subsets, and for each subset, we are checking if it contains all disjointed intervals. So to compute the number of subsets, it takes O(2N) time. The maximum length can be N. Sorting the N length subset takes O(NlogN) time. Hence in total, it takes O((NlogN) *2N) time.

Space complexity: O(2N) at the worst case as we are computing all the subsets.

The above algorithm has some loopholes that we need to tackle.

The foremost problem is the exponential time complexity the algorithm is taking, and then, the space complexity is also worse, i.e., exponential.

This gives us the motivation to improve our algorithm.

So let’s think about an efficient approach to solve the problem.

Approach(Efficient)

If we want to build up an efficient solution, we will need to find out if there is any redundancy or any repetition that we are doing.

So the redundancy in our above approach is that we are finding all possible subsets and then checking each subset if it’s a disjoint interval subset or not and updating our answer. So we aim to avoid computing all subsets. Hence, we need to make some observations using the definition of disjoint intervals or from the repetitive checks in the previous algorithm and infer some important points to simplify our approach. 

If we notice the redundancy in our previous code also lies when we are sorting it every time for each subset. Hence it makes sense to sort it first so that we don’t have to sort it later repeatedly.

Now you may choose to sort by the second element of the intervals or the first element. If we remove the computation of all subsets from our algorithm,  we will achieve our goal, to make our algorithm time and space-efficient.   

Let’s assume we sort the set of intervals by their endpoints.

The intuition behind Sorting by endpoints of an interval?

The intuition behind sorting by endpoints of an interval is that finding the intersection between 2 intervals becomes easy as we just need to compare endpoint of ith interval with jth interval where j>i to find their intersection and sorting also helps us reduce the number of comparisons because if the jth and ith intervals are disjoint then, we don’t need to separately check for all intervals at an index greater than j.


Claim:

If we take a greedy path here, i.e., sorting the interval set by its endpoints and progressively find the minimum number of intervals to be removed such that it becomes a maximal disjoint set, we can claim that the size of the remaining subset is the maximum length of the disjoint interval set. 

(Tip of Advice: get rid of a greedy path as early as possible. If it works and you can think of an intuitive proof then, great; otherwise, you won’t spend time, later on, thinking on a greedy path).

Explanation of the above claim is :

Let the interval set be {[a1, b1], [a2, b2], [a3, b3], ……….., [an, bn]}.

And let’s say that we removed the second interval and the last interval, then the remaining interval set {[a1, b1], [a3, b3], ……….., [an-1, bn-1]} is the maximal disjoint set. Hence we need to minimize the number of removals to maximize the length of the disjoint set. This can be stated mathematically also, 

max{Length of disjoint interval set}  = max{size - #intervals to be removed} → max{Length of disjoint interval set} ~ min{#intervals to be removed}


Informal Proof of the above claim:

(Proof by contradiction)

Let’s assume that minimizing the number of intervals to remove from the sorted interval set does not find the optimal answer, i.e., the maximum length of the disjoint set. If we don’t minimize the number of removals, it means that at least k + 1 intervals must be removed, where k is the minimum number of intervals removed by our algorithm.

But we claimed that minimizing the number of intervals to remove from the sorted interval set finds the maximum length of the disjoint set. And, the maximum length of the disjoint set found using the assumption is less than the length of the disjoint set found by our claim. Hence the assumption is wrong.

This proves the correctness of our algorithm. 

Now the question is how to minimize the number of intervals to be removed?

So if we start iterating from the starting interval and include an interval in our disjoint set, if it’s disjoint. But if an interval isn’t disjoint with the previous interval, we can ignore the interval with a higher endpoint and resume the process from the next interval.

It can be proved by contradiction that it minimizes the number of intervals to be removed or maximizes the length of the disjoint interval set.

(The proof is left as an exercise for you).


Let’s formulate our approach.

  • Sort the interval set by its endpoints
  • Start traversing the interval set and increment the count of intervals to be removed by 1 if we find that the 2 intervals intersect. Ignore the interval with the higher endpoint.
  • Repeat the same process from the next interval.
  • Return the answer = size of interval set - the minimum number of intervals to be removed.  


Let’s look at the Code.

CODE IN C++(Efficient)

//C++ program to find the maximum length of disjoint set

#include <bits/stdc++.h>

using namespace std;

 

// comparator function to sort by endpoints

bool comp(vector<int>& a, vector<int> &b){

    if(a[1]==b[1]) return a[0]<b[0];

    return a[1] < b[1];

}

 

// finds the maximum length of disjoint interval set

int maximumDisjointIntervals(vector<vector<int>> & intervals, int N){

    int maxLength = 0; // maximum length of the disjoint intervals set

    

    // sort the intervals

    sort(intervals.begin(), intervals.end(), comp);

    

    // variable to store the min_intervals_removed

    int min_intervals_removed = 0;

    

    // reference vector using which we compare the next intervals

    // if it has to be included in the interval set 

    vector<int> referenceInterval = intervals[0]; 

    

    // iterate over all intervals

    for(int i=1;i<intervals.size(); ++i){

        

        // if intersection is not empty

        if(referenceInterval[1] > intervals[i][0]){

            min_intervals_removed++; // increment removed intervals count

        }else{

            referenceInterval = intervals[i]; // update the reference Interval

        }   

    }

    

    //return the answer

    return intervals.size() - min_intervals_removed;

}

 

int main() {

    

    // number of intervals

    int N = 4;

    

    // vector of intervals

    vector<vector<int>> v = { { 1, 4 },{ 2, 3 }, { 4, 6 },{ 8, 9 } };

 

    // compute the result

    cout<<"The maximum length of disjoint interval set is: ";

    cout<<maximumDisjointIntervals(v, N);

    return 0;

}

Output

The maximum length of disjoint interval set is: 3

Time Complexity: O(N*LogN)

Since we are sorting the set of intervals of length N, which takes most of the time, it takes O(N*LogN) time.

Space complexity: O(N) at the worst case, as the auxiliary space used by the sorting algorithm is O(N).

Hence we reached an efficient solution from an exponential solution. Note: The problem could be solved using DP also, but it would still take O(NLogN) time because sorting is still required to solve the problem (You can try implementing the DP solution yourself as an exercise. For more insights, refer here.)

Frequently Asked Questions

Does a greedy algorithm always work?

Answer)  No, a greedy algorithm does not always work. To solve a problem via a greedy algorithm, you need to have intuitive proof in your mind, at least to lead to the correct solution. To show when it doesn’t work, you can think of a counter-example that will help rule out the greedy algorithm.

Why is sorting important?

Answer) Sorting is a very useful technique because we can reduce a significant number of comparisons. With reference to the question discussed in this blog, we moved from an exponential solution to an O(N*LogN) solution. This is sufficient to explain the power of sorting.

What is a  comparator?

Answer) Comparator is an important concept. It is a function object that helps to achieve custom sorting depending on the problem requirements. 

Key Takeaways

This article taught us how to find the Maximum Disjoint Intervals by approaching the problem using a naive approach followed by an efficient solution. We discussed an iterative implementation using examples, pseudocode, proof of our solution, and proper code in detail.

We hope you could take away critical techniques like analyzing problems by walking over the execution of the examples and finding out how we can apply sorting to simplify our task and make fewer comparisons, and how you should approach a greedy solution before writing the code. Testing a greedy algorithm as early as possible is a good practice as there is no guarantee that it will always work. Hence, if it does not work, you could rule it out and not follow a similar approach to solve the problem.

Now, we recommend you practice problem sets based on the concepts of Maximum Disjoint Intervals to master your fundamentals. You can get a wide range of questions similar to the problem Maximum Disjoint Intervals on CodeStudio

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think