Monday, January 31, 2011

Binary Indexed Tree.

Let us discuss an interesting problem today related with Cumulative frequency. Consider a hotel database which stores timings and number of phone-calls arrived during the interval.

Time => Number of calls

10 - 12 => 30

12 - 14 => 5

14 - 16 => 12

16 - 18 => 25

...

In this example, cumulative sum of calls made before 18 means Number of calls made in the interval 16-18 + (14-16) ... (10-12). It could be considered as a Range-sum. Understanding of this problem is very essential to realize the need for Binary indexed Tree.

Let us assume that this information had been stored in an array, which has interval mapped as index, and number of calls as values. Mapping from time interval to array index is not very important for this discussion.

Now the problem is to find a suitable data structure which could make updates, insertions and queries on this information in an efficient way. Let us think of naive solutions first.

Recursion Approach:

If we have to find out number of calls made before 18, then we will have to add up the values prior to that interval.

Mathematically, CumulativeSumTill(n) = valueAt(n) + CumulativeSumTill(n-1), a recursive relation. We know this could be implemented with recursion, with complexity O(n), as it needs traversal of all elements occur before the given element. Finding cumulative sum at an index is not efficient in this way.

If we have to add up a new interval information, we could easily add in source array in constant time as addition would generally happen at the end of the array. So with updating any interval's info also. So, updates happen at constant time, very efficient.

Iterative or Dynamic Programming Approach:

With Dynamic programming, iterative approach, if we could memoize the intermediate results in a separate table, we could make complexity of Find or Query, O(1), a constant. In the below given code, we had memoized the cumulative sum results in a separate table. So, getting a CS at an index is very efficient. But think of an update at an index '0', which would render all the stored results invalid. Every updates would be of complexity O(n), very inefficient.

void MemoizeCumulativeSum()
{
CumulativeSum[0] = SourceArray[0];

for( int i = 1; i < lengthOfSource; i++)
{
CumulativeSum[i] = CumulativeSum[i-1] + SourceArray[i];
}
}

Summary of Naive Solution:
Dynamic programming approach or storing intermediate results approach makes query fast, but updates slow. On the fly calculation, recursion based approach, makes updates fast, but query slow. This is a perfect instance of Space-Time trade-off. So, in order to have both query and update happen in an efficient way, we need to think of a different data-structure which combines above-said two approaches in a perfect way.

Binary Indexed Tree:
We are going to discuss a data structure which helps to query and update in logarithmic time.

Binary indexed tree or Fenwick tree, could be realized using an array which holds cumulative sum or sub-cumulative sum as its values. Every index of this array would hold partial cumulative sum or complete cumulative sum according to the BIT algorithm. This array makes query and update happen in logarithm time as described below.

Basics:

Any number could be expressed in 2's powers. So, number of digits of any number in 2's power form would be log n of base 2. Presto, this is what exactly the property being made use in BIT.

Initially let us assume an auxiliary table as in iterative approach, contains all zeroes. While updating some value in source array, we need to make sure all the indices in the auxiliary table which contains partial or cumulative sum with that value, updated. This may look like naive iterative approach; but the beauty of this data-structure is that it doesn't need to update all the values as in naive memoization. So, this could be considered as an educated or intelligent memoization. On the average, it may need log n updates in auxiliary table which makes update efficient. Same way, query needs sum of certain values in auxiliary table unlike naive recursion approach, which needs to process all the values in source array. BIT needs to process only log n values for a query. So, query and updates are efficient in this data-structure.

BIT Core Algorithm:

A complete implementation of BIT which supports update and query, has been given at the end of this blog.

While the start of the algorithm, prepare an auxiliary array of source array's size and initialize all the elements with zero. Now loop through all the elements in the source array and update the auxiliary table according to below code. Below algorithm calculates the indices to be updated, as specified by the logic, nextIndex += (nextIndex & -nextIndex). ANDing a value and its negated isolates a non-zero-least significant bit. If an index is "xxx1000" then, index & ~index would result in "0001000". Now the new index is "xxx1000" + "0001000" would unset the least significant non-zero bit in "xxx1000". Hence this algorithm, in the worst case would update log2 n indices in the auxiliary table.


Below algorithm would be used to reflect any modifications made in source array; if value in some index had been incremented in the source array, this must be called in order to keep the BIT updated. The same argument with regards to complexity of initialization holds good for updates also. If you couldn't get a hold of algorithm, no worries. We will discuss the algorithm intuitively later.


Same way, query algorithm is as follows:



Intuitive Analysis:

Actually, after all updates, BIT or auxiliary table will contain either partial or complete cumulative frequency depends on the index.

Indices which are powers of two contain complete cumulative frequency of that index. Others would contain partial.

'6' can be written as sum of powers of two as 4 + 2. Now take the value at 4, value at index 2 starting from 4 and add up to get Cumulative Frequency of 6.

Same way '7' can be written as 4 + 2 + 1. Take value at 4, value at index 4+2 and value at index 4+2+1 and add up those. Intuitively, this is what exactly happens in Query algorithm.

Same could be applied for Update algorithm also. Only complexity of this algorithm is to understand the procedure. :)

BIT Class implementation:

class BinaryIndexedTree
{
public:
BinaryIndexedTree(vector<int> inputArray)
{
internalArray = NULL;
InitializeBIT(inputArray);
}
void IncrementValue(int value, int index)
{
int indexToModify = index + 1;

while(indexToModify < arraySize)
{
internalArray[indexToModify] += value;
indexToModify += (indexToModify & -indexToModify);
}
}
int Query(int index)
{
int indexToRetrieve = index + 1;
int retValue = 0;
while(indexToRetrieve)
{
retValue += internalArray[indexToRetrieve];
indexToRetrieve -= (indexToRetrieve & -indexToRetrieve);
}
return retValue;
}
private:
void InitializeBIT(vector<int> inputArray)
{
// Initialize internal array;
// Zeroth index is Sentinel.
arraySize = inputArray.size() + 1;
internalArray = new int[arraySize];
for(int i = 0; i < arraySize; i++)
internalArray[i] = 0;
for(int i = 1; i < arraySize; i++)
{
int valueToBeAdded = inputArray[i - 1];
internalArray[i] += valueToBeAdded;
int k = i;
while( k < arraySize)
{
k += (k & -k);
internalArray[k] += valueToBeAdded;
}
}
}
int *internalArray;
int arraySize;
};

Some more links for this topic



Thanks for Reading.

9 comments:

  1. Hi Karthik, Intuitive analysis helped me great deal thank you:)

    ReplyDelete
  2. I've been reading a lot of tutorials about BIT.
    This is the most intuitive one.
    You are good at making difficult things simpler.
    Please keep writing. :)

    ReplyDelete
  3. So much thanks...
    Your post almost cleared all my doubts and helped me understand BIT very well. :)

    ReplyDelete
  4. Very nice tutorial :)
    I have a doubt in the update(a,b,v) which adds v to all elements from index a to index b.Suppose N=8 i.e there are total 8 elements. Now update(3,6,10) will update(add 10 to) elements from index 3 to index 6. update(3,6,10) can be performed as update(3,10) and update(7,-10). update(3,10) adds 10 to indexes 3,4 and 8. Similarly update(7,-10) adds -10 to indexes 7 and 8.Now to find sum(6) which is the sum of all elements from index 1 to 6 , will be sum(5...6)+sum(1.....4) but index 5 and 6 were not updated. So I want to ask how is it giving the correct result?

    ReplyDelete
  5. Sourabh, Thanks for your comment.

    If I understand your question properly, you want to update the values for the range of indices, 3 to 6. If so, you have to call update function 4 times, like update(3, v), update(4,v), update(5,v), update(6,v). If not, can you just elaborate your question further?

    ReplyDelete
    Replies
    1. Very nice tutorial :)
      I have a doubt in the update(a,b,v) which adds v to all elements from index a to index b.Suppose N=8 i.e there are total 8 elements. Now update(3,6,10) will update(add 10 to) elements from index 3 to index 6. update(3,6,10) can be performed as update(3,10) and update(7,-10). update(3,10) adds 10 to indexes 3,4 and 8. Similarly update(7,-10) adds -10 to indexes 7 and 8.Now to find sum(6) which is the sum of all elements from index 1 to 6 , will be sum(5...6)+sum(1.....4) but index 5 and 6 were not updated. So I want to ask how is it giving the correct result?

      Delete
  6. Karthik...

    I don't think your InitializeBIT function is right. First off, it results in an out of bounds error, if, for instance, you have an input array of [3, 6, 5], it will try to access internalArray[4] when dealing with the first element. And I'm not sure about what you're expecting the input array to have, but if you're trying to split counts up into "sub-counts", wouldn't you, when adding the new input value, need to subtract the bits from valueToBeAdded until you get to 0?

    ReplyDelete
  7. Very nice tutorial :)
    I have a doubt in the update(a,b,v) which adds v to all elements from index a to index b.Suppose N=8 i.e there are total 8 elements. Now update(3,6,10) will update(add 10 to) elements from index 3 to index 6. update(3,6,10) can be performed as update(3,10) and update(7,-10). update(3,10) adds 10 to indexes 3,4 and 8. Similarly update(7,-10) adds -10 to indexes 7 and 8.Now to find sum(6) which is the sum of all elements from index 1 to 6 , will be sum(5...6)+sum(1.....4) but index 5 and 6 were not updated. So I want to ask how is it giving the correct result?

    ReplyDelete
  8. hey i don't get concept of range update. for adding v to every number in original array in range a,b, do we not have top call point update function on binary index tree for every index in range a to b,. can't we think it as point updates for every index in a to b. thus how using only two point update Update(a, v); Update(b+1, -v) does the job of all point updates in index a to b

    ReplyDelete