Median of Stream of Integers Problem

Arun Nawani
Last Updated: May 13, 2022

Introduction

In this Median of Stream of Integers Problem. We will solve the most famous and interesting coding problem based on heap data structure.

Okay !!!!

 

Problem Statement

 

The median of the stream of integers, or running integers median problem, is that we need to find the effective median of all the incoming integers in the array. For example, we have an array of integers 8, 9, 2, 4, 5.

 

We need to find the effective median after every addition of a new element in the array. 

 

It goes like 

 

Median after appending the 1st element- 8: 8

Median after appending the 2nd element - 8, 9: 8.5

Median after appending the 3rd element - 8, 9, 2: 8

And so on…

 

We’re asked to print all such effective medians. 

 

Approaching the Problem

 

There can be multiple ways to handle the problem. The first method that might strike you immediately after reading and understanding the problem statement would be to use a sorting function after every new element added to the array. But this approach would have a very high time complexity. There’s a much more optimized approach that we could take. Let’s see - 

 

Hope you know how Heaps data structure works. If not, you should check out our curated foundation courses in Java, Python, and C++ expertly compiled by industry experts from Stanford University and Microsoft. 

 

We need to maintain two heaps, a max-heap and a min-heap.

  1. If the size of both the heaps is equal, the effective median would be the average of the top elements of both heaps. 
  2. If the size of any of the heaps is greater than the other (The absolute difference between the size of the two heaps can’t exceed 1), the top element of the heap with more number of elements would be the effective median.

 

Incoming elements smaller than the current median would go into max-heap, and elements greater than the current median would go into min-heap. 

 

If, at any point, the absolute difference between the size of the heap is greater than 1, We need to shift the top element from the bigger heap to the smaller one and continue the process. 

 

Algorithm

 

  1. Check if the input array length is >1. If not, return the first element of the array if the array length is 1. Else return None if the size of the input array is 0.

 

2. Initiate a min-heap and a max-heap with the first two values in the array. Push the smaller value in the max-heap and the bigger value in the min-heap. Initiate curr_med as the average of top elements of both the heaps.

 

3. If the incoming element is greater than the curr_med, push it in the min-heap. Otherwise, in max-heap

 

4. If the sizes of both the heaps are equal, store the median value as the average of the top elements of both the heaps.

 

5. If the sizes of both the heaps differ by 1, store the median value as the top element of the bigger heap.

 

6. If the sizes differ by more than one, pop the top element from the larger heap and push it into the smaller heap to balance the heaps. Store the median value as the average of the top elements of the heaps. 

 

7. Repeat steps 3-7 until the end of the input array.

 

Code

 

import heapq
def medianOfInt(arr,size):
  #creating a min-heap and max-heap as well as a list medians to store all the
   #effective medians. 
   Minheap=[]      
  Maxheap=[]  
  medians = []

  if size==0:
      return None      

  if size==1:
      return [arr[0]]     #need atleast one element in the input array.
  
  
  med = arr[0]
  medians.append(med)     #append the medians in the median array. 
  heapq.heappush(Maxheap, -med)    

  for i in range(1, size):
    x = arr[i]

    if len(Maxheap) > len(Minheap):
      if x < med:
        heapq.heappush(Minheap, -(heapq.heappop(Maxheap)))
        heapq.heappush(Maxheap, -x)      

      else:
        heapq.heappush(Minheap, x)

      med = ((-Maxheap[0]) + Minheap[0]) / 2.0
    
    elif len(Maxheap) < len(Minheap):
      if x > med:
        heapq.heappush(Maxheap, -(heapq.heapop(Minheap)))
        heapq.heappush(Minheap, x)

      else:
        heapq.heappush(Maxheap, -x)

      med = ((-Maxheap[0]) + Minheap[0]) / 2.0

    else:
      if x < med:
        heapq.heappush(Maxheap, -x)
        med = int(-Maxheap[0])

      else:
        heapq.heappush(Minheap, x)
        med = int(Minheap[0])
    
    medians.append(med)

  return medians

arr=[int(x) for x in input().split(" ")]
size=len(arr)
ans=medianOfInt(arr,len(arr))
print(ans)

 

 

Input : 9 8 7 3 2 1
Output : [9 , 8.5 , 8 , 7.5 , 7 , 5.0]

Time And Space Complexity: 

 

Time Complexity: The heap operation and reading from the stream together makes the time complexity of the algorithm O(N logN).

 

Space Complexity: Since we used heaps to store the data from the stream of integers, the auxiliary space needed was of the order O(N).

 

FAQs 

  1. What is the complexity analysis of the above-mentioned algorithm?

The Time Complexity of the algorithm is O(N logN) as we devised earlier. And the auxiliary space needed was of the order O(N).

 

2. What is the difference between a min-heap and a max-heap?

The element at the root in a min-heap is the smallest element of the heap. While in a max-heap, the root node holds the largest element in the heap. Complexity analysis of operations in both the heaps is identical. 

 

3. What is the time complexity of insertion operation in a heap?

If the heap is a binary heap, the inserted element is first appended at the bottom of the heap and then it moves up the height of the heap. The Height of the binary heap is equal to logN. So the insertion operation in a binary heap is of the order O(logN). 

 

4. What is the time complexity of retrieval of the minimum element in a min-heap?

The minimum element is always at the root level in a binary min-heap. So the retrieval is of the order O(1).

 

Key Takeaways

 

Practicing this problem a couple of times would give you quite a lot of clarity about how heaps operate and when you can make use of them to optimize your solution. The operations in a heap are fairly cheap, and that makes it come in handy when we are constantly looking for minimum or maximum elements in an array. Heaps are vastly used in problems where we need to operate after every addition of a new element in an array. 

Heaps are also one of the hot topics in a product-based company interview. Don’t just stop here. There are a plethora of questions straight from interviews on our coding platform Code Studio

 

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think