Update appNew update is available. Click here to update.

Longest Subarray not having more than K Distinct Elements

Teesha Goyal
Last Updated: Jul 19, 2022
MEDIUM

Introduction

This article will discuss the problem of finding the longest subarray not having more than K Distinct Elements. We will also discuss the implementation and complexities of the solution to the problem.

Problem Statement

You are given an array consisting of N elements. You are also given an integer K. You need to find the longest subarray of the given array not having more than K distinct elements. A subarray of an array is a part or section of an array. It has to be continuous. For example if you are given an array arr[] = {1, 2, 3, 4, 5, 6, 7} then some of the subarrays of this array are {2, 3, 4}, {7}, {5, 6}. But {1, 3, 5} is not a subarray of the given array as the elements do not occur consecutively in the given array.

Example 1

Input 

given_array = {1, 2, 3, 6, 3, 3, 2, 1 ,4, 5} 
K = 3

Output

2 3 6 3 3 2

Explanation

There are 6 distinct elements in the given array, and we need to find the longest subarray that does not have more than 3 distinct elements. It means that the output array can have at most 3 distinct elements. Hence the output 2 3 6 3 3 2.

Example 2

Input 

given_array = {1, 2, 3, 6, 3, 3, 2, 1 ,4, 5} 
K = 7

Output

1 2 3 6 3 3 2 1 4 5

Explanation

The given array has 6 distinct elements. The given value of k is 7. So the number of distinct elements is less than K. Hence, the output is the complete array.

Solution Approach

The brute force algorithm for the given problem is to traverse the complete array using two loops and check the number of distinct elements for each subarray and then output the longest subarray.

The efficient solution is to use hashing and two-pointers. The two pointers are used to track the subarray we are considering at any moment. We will keep track of the frequency of the occurrence of items using hashing. Hashing will help us remove all the occurrences of the element from the subarray to ensure that there are at most K distinct elements in the subarray at any time. 

Algorithm

  • Initialize two pointers first and last that mark the subarray's starting and ending point being considered. Also, create a dictionary for hashing. This dictionary will store the frequency of elements occurring between first and last. Initialize a counter to zero to store the number of distinct elements between first and last.
     
  • Loop over the given array considering each element at a time. Increment the frequency of the current element. If it is the first time the element has occurred, then increment the counter.
     
  • Continue until the counter becomes greater than K. If the counter becomes greater than K, remove the element from the left side. If the occurrence of any element becomes zero, decrement the counter.
     
  • Print the longest such subarray.

Python Implementation

''' Collections library is used to create a dictionary to store the frequency of elements in the subarray being considered'''
import collections

''' Function to find the longest subarray having not more than K distinct elements '''
def LongestSubarray(arr, K):
   
    size = len(arr)
    frequency = collections.defaultdict(int)
    first = 0
    last = 0
    counter = 0
    temp =0
   
    for idx in range(size):
 
        ''' Increment the frequency of arr[idx] by 1'''
        frequency[arr[idx]] += 1
 
        ''' If it is the first time arr[idx] is occuring in the considered subarray then increment the counter '''
        if frequency[arr[idx]] == 1:
            counter += 1
 
        ''' Remove elements from the subarray until the counter becomes less than or equal to K'''
        while counter > K:

            frequency[arr[temp]] -= 1
 
            ''' If no other occurences remain for arr[temp] then decrement the counter'''
            if frequency[arr[temp]] == 0:
                counter -= 1
 
            temp += 1
 
        ''' Update first and last for longest subarray'''
        if (idx - temp + 1 >= last - first + 1):
            last = idx
            first = temp
 
    ''' Print the Longest subarray not having more than K distinct element'''
    print(*arr[first: last+1])
 
 
''' Driver Function '''
def main():
 
    arr = [1, 2, 3, 6, 3, 3, 2, 1 ,4, 5]
    K = 3
    LongestSubarray(arr, K)
   
''' Executing Main '''
main()

Output

2 3 6 3 3 2

Complexities

Time Complexity

The time complexity of the given solution is O(N), where N is the number of elements in the given array.  

Reason: We are traversing the entire array, hence the complexity of O(N).

Auxiliary space complexity 

The Auxiliary space complexity of the given solution is O(N), where N is the number of elements in the given array.

Reason: We are creating a dictionary to store the frequency of occurrence of elements. Hence, the complexity of the given solution is O(N).

Frequently Asked Questions

What is an array?

An array is a container to store the elements in non-contiguous memory locations. In C++, an array can store only homogeneous elements, but in python, it is implemented using a list, and a list can store heterogeneous elements in python.

What is the subarray of an array?

A subarray of an array is a part or section of an array. It has to be continuous. For example if you are given an array arr[] = {1, 2, 3, 4, 5, 6, 7} then some of the subarrays of this array are {2, 3, 4}, {7}, {5, 6}. But {1, 3, 5} is not a subarray of the given array as the elements do not occur consecutively in the given array. 

What is the significance of complexity calculation?

There are two main factors on which the performance depends time and space. If we have two solutions to a problem, then to determine the better solution, we can compare the time and space complexity of the two solutions.

What is the time complexity of a program?

The time complexity is the amount of run time of the program in terms of input data. While calculating the time complexity, the constant time is ignored, and the input data value is assumed to be huge.

What is the auxiliary space complexity of a program?

Auxiliary Space Complexity is the amount of extra space required by a program to solve a problem. All the constant amount of value is ignored. It is the space occupied by the program other than the input and output data.

Conclusion

This article discussed the problem of finding the longest subarray not having more than K Distinct Elements. We also discussed the implementation and complexities of the solution to the problem using hashing.

I hope you would have gained a better understanding of these topics now!

Are you planning to ace the interviews of reputed product-based companies like AmazonGoogleMicrosoft, and more? 

Attempt our Online Mock Test Series on CodeStudio now!

Thank You

Happy Coding!

Was this article helpful ?
0 upvotes