A quick look at Kadane’s Algorithm

A quick look at Kadane’s Algorithm
A quick look at Kadane’s Algorithm

Kadane’s algorithm is used for “Maximum subarray sum” in any given array of integers. Some of you may think it’ll be a sum of all elements of an array. There’ll be negative elements in the array which will decrease the sum of the whole array.

An algorithm is finding the contiguous sub-array within a one-dimensional numeric array which has the largest sum. Most of the CS students nowadays mostly start learning hard things without even knowing the basics. i.e. start learning segment tree without knowing Kadane’s algorithms. It is an iterative dynamic programming algorithm. It is very common to optimise iterative DP algorithms to remove one dimension of the DP matrix along the major axis of the algorithm’s progression. The usual ‘longest common subsequence’ algorithm, for example, is usually described with a 2D matrix, but if the algorithm progresses from left to right, then you really only need space for two columns.

Kadane’s algorithm is a similar optimisation applied to a 1D problem, so the whole DP array disappears. The DP code in your question has a 2D matrix for some reason. I don’t know why — it doesn’t really make sense.

In other words, problem statement: Given Array Of N integers a1, a2, a3, ….. aN, find the maximum sum of sub-array of a given array.

Constrains:
1 <= N <= 2 * 10^5
-10^9 <= ai <= 10^9 (ai = ith element i = 1 to N )

Explanation:
A simple idea of the Kadane’s algorithm is to look for all positive contiguous segments of the array (max_ending_here is used for this). And keep track of the maximum sum contiguous segment among all positive segments (max_so_far is used for this). Each time we get a positive-sum compare it with max_so_far and update max_so_far if it is greater than max_so_far.

Kadane’s algorithm works by maintaining the start position of a subarray and repeatedly looking at the next element in the array and deciding to either.

  • Extend the subarray by appending that element, or
  • Discarding the subarray and starting a new subarray after that element.

If you explicitly keep track of the start point of the current subarray (initially it’s before the first element of the array, and it resets every time that the total drops below zero), it shouldn’t be too hard to find the optimal subarray. Every time you update the maximum subarray found, you either.

  • Have just appended a new element to the existing maximum subarray (so just append to the best thing you’ve found so far), or
  • Have found a new subarray that starts at a different position than the previous best subarray, so throw out the old max and replace it with this new one.

You can implement this by tracking not just the maximum subarray so far, but also its start position. You’re in case (1) if the max subarray starts at the same position as your current array and case (2) otherwise.

Proof of Kadane’s algorithm to a beginner with no dynamic programming experience:-

The problem we are trying to solve is the Maximum subarray problem, which asks for the largest sum of a contiguous range a[x] + a[x+1] + … + a[y-1] + a[y], from some input array. Here’s an implementation of Kadane’s algorithm that does so, modified from the version given in the link above:

(While Python supports large integers, it doesn’t have a negative infinity for large integers, so we have to use a float to get started— we could use None and add an extra check in the loop, instead, or handle the first element of “numbers” separately).

The question we should ask ourselves is, what does the largest subarray look like? Often we start by reasoning about what property it must have and then writing an algorithm that finds that property. Let’s put it the largest subarray in context:
a[x-1], a[x], a[x+1], …, a[y-1], a[y], a[y+1]
Since x through y is the largest, it must be that adding a[x-1] to it makes the sum smaller! And similarly for a[y+1]. How about going one step further back?

x[x-2], a[x-1], a[x], a[x+1], …
If x is the start of the largest subarray, then adding a[x-2] and a[x-1] must result in a smaller number. The same is true if we add lots of additional elements:
x[x-k], …, x[x-2], a[x-1], a[x], a[x+1], …
if a[x-k] through a[x-1] are not part of the largest subarray, it must be because the sum of x[x-k] through a[x] is less than a[x]. Otherwise, we could get a larger subarray by adding all those extra elements.
So, for example, if we have
1000, -1, -2, …, more negative elements, …, -3, 1, 2000, …
then the largest subarray will only start at 1000 if the sum of all the intermediate results is greater than -1000. If the sum is < 1, we should start with 1.

Now we know what the (potential) starting point of a maximum sub-array looks like: it’s a value that is greater than the sum all previous elements.

current_sum = max(x, current_sum + x)

does. It compares starting at x, with starting someplace in the past and including all the elements from that point forward. Let me try to state that a different way, for emphasis:

Each time through the loop we’re comparing starting at a[i] (the current element) with starting at some previous point. If we get any value at all from starting further back, then we do so. Otherwise we try to make a sub-array starting at a[i].

Once we have that insight in place, finding the end is easy: just keep extending the sequence until we find a new starting point, and compare all the possible ending points we find along the way.

So, if we understand the logic behind the choice in line 5, then a formal proof is almost just a matter of writing down the invariants:

current_sum[i] = sum of elements from the best starting point so far, to i-1

best_sum[i] = largest subsequence using the elements up to i-1

If a[i] is greater than current_sum[i] + a[i], then we’re strictly better off starting at a[i] than any previous point. So current_sum[i] becomes that strictly better choice. If the sum from the best starting point, through a[i], is better than any subarray we have found so far, then remember it as our best sum.

Question: Given an array, find the maximum subarray sum.
Eg: For given array : [-2,1,-3,4,-1,2,1,-5,4]
output : 6 for subarray [4,-1,2,1]
Brute force: O(n^2)
Brute force solution would be to go generate all possible subarray and find the maximum subarray.

Now let’s observe and find patterns that might help us optimise our solution. For an Array A let’s consider the following observations
If for subarray Sum(A[i,….,j-1]) < A[j], then we know that there’s no need to calculate Sum(A[i+1….,j-1]) again since it will definitely be less than A[i].
So based on this, if we come across a situation where the current element is greater than the sum of previous elements, then we shall start a new subarray from the current subarray.

Let’s understand this:

So as you can see here,
1 > we maintain two containers, sum and maxSum, we keep on adding elements to sum and compare it with maxSum and change maxSum only if sum>maxSum.
2 > we change sum when the current element is greater than the sum, this approach improves our time from O(n^2) to O(n). The Maximum Subarray problem gives us an array of numbers and asks us to return the sum of the longest contiguous subarray.

Input: [-1,5,2,-2,1,3,-4,2,-5,6]
Output: 9

Dynamic programming is a lot about optimization, so what about the brute force solution is suboptimal? It doesn’t retain any information from any of the calculations it makes, except for the current max value. The function winds up iterating over the same stretch of values over and over because it has no memory of what the outcome was the last time it ran through.


We don’t need to store the sum of each subarray in memory, though. That wouldn’t exactly help us. But if we could design an approach where we stored only the information we needed based on the previous work done, we could reduce the repetitive iteration of our function down to effectively zero. Looking at the example input again, [-1,5,2,-2,1,3,-4,2,-5,6], we can start by imagining what would happen if we walked through the array just one time and added each value together, replacing each value as we went with the total up to and including that index. That would produce an array like this, [-1,4,6,4,5,8,4,6,1,7].

If you’re like me when I was first trying to solve this, you might see the run from -1 to 8 and see a hint of a solution. If this array were to be graphed, we would see the lowest trough and the highest peak, and would be able to identify the biggest difference between troughs and peaks, given that they come in that order. In this example it almost seems like that would work.

But we don’t really need to keep counting that -1 at the front, as we keep walking through our array and adding everything up, do we? That -1 is not going to be part of the resulting maximum. What we want is that when we hit 4 and see that 4 > -1, or more specifically that 4 + -1 is > -1, we can just disregard the sum up to the previous index and start counting anew from 4.

What this brings us to is a simple comparison. At each index, we want to compare the value at that index (nums[n]), with not just the total at the index before, but that plus the value at the current index (nums[n] + nums [n — 1]).

Code:

That’s the whole solution, this function transforms the nums array [-1,5,2,-2,1,3,-4,2,-5,6] to an array of the greatest possible sum up to that point, [-1,5,7,5,6,9,5,7,2,8]. All that’s left is to return the greatest value in the array, 9.

#include<iostream>

using namespace std;
// Function to find contiguous sub-array with the largest sum
// in given set of integers
int kadane(int arr[], int n)
{
// stores maximum sum sub-array found so far
int max_so_far = 0;
// stores maximum sum of sub-array ending at current position
int max_ending_here = 0;
// traverse the given array
for (int i = 0; i < n; i++)
{
// update maximum sum of sub-array “ending” at index i (by adding
// current element to maximum sum ending at previous index i-1)
max_ending_here = max_ending_here + arr[i];

    // if maximum sum is negative, set it to 0 (which represents
    // an empty sub-array)
    max_ending_here = max(max_ending_here, 0);

    // update result if current sub-array sum is found to be greater
    max_so_far = max(max_so_far, max_ending_here);
}
return max_so_far;

}
int main()
{
int arr[] = { -2, 1, -3, 4, -1, 2, 1, -5, 4 };
int n = sizeof(arr)/sizeof(arr[0]);
cout << “The sum of contiguous sub-array with the largest sum is ” <<
kadane(arr, n);
return 0;
}

It seems like the solution shouldn’t be that simple, but that’s sort of the point and the beauty of it. The solution is so specifically optimised around collecting only the information we need to know, there’s no need to collect lots of repetitive and extraneous data about each possible sub-array. For me, knowing such a solution was possible helped me look for more linear solutions to other problems. It didn’t make them easy, but it opened up the possibility.

To learn more about algorithms, click here.

By Yogesh Kumar