Skip to main content

Running Median Using Heap in Java

Running Median Using Heap (Java):
Our goal is to find median of the elements as they come. So first we should know what is a median.

Median:
Sort the elements and
1. middle element is the median in case number of elements is odd
2. Or (middle element + (middle + 1)th element)/2 is median

Example 1:
Input: 10,2,3,4,5
Sort :  2,3,4,5,10
Because number of elements are odd 5/2=2

Example 2:
Input: 1,10,2,3,4,5
Sort :  1,2,3,4,5,10
Because number of elements are even, so  (n/2th element + n/2+1th element)/2 = (3+4)/2 = 3.5



So now we will focus on the problem:

The input to our problem is some numbers and as the numbers are coming we have to print the median of the numbers which we got till now.

So input could we as follows:
Input:
12
4
5
3
8
7
Output:
12.0
8.0
5.0
4.5
5.0
6.5

Explanation:
At first we have only 12 so just print 12.0,
Next 4 came so sort 12, 4 that is 4, 12 and the median is (12+4)/2 = 8.0
Next 5 came so numbers are now 4,5,12 so median will be 5 as the number of elements is odd.

Then how do we solve this problem using heap data structure:

Algorithm:
  1. Take one min heap and one max heap.
  2. For the first two numbers do the following:
  3. add maximum of a,b to min heap and 
  4. add minimum of a,b to max heap
  5. Print the median manually for first two numbers
  6. Now follow the below steps:
    • Check new_elemt < min_heap_head
      • if yes then add it to min_heap
      • else add it to max_heap
    • Check if |size(min_heap)-size(max_heap)| > 1
      • if yes then delete min from greater one and add it to lesser one
    • If size of both heaps after above all operation is equal
      • If yes print the median as (pop one-2 elements form each heap take sum)/2
      • else print the top element from the greater size heap
The code in java is implemented using Priority Queue:
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class RunningMedian {
    public static void main(String args[] ) throws Exception {
        /* Enter your code here. Read input from STDIN. Print output to STDOUT */
        Scanner in = new Scanner(System.in);
        int s = in.nextInt();
        // min Heap
        PriorityQueue pq = new PriorityQueue();
        // max heap
        PriorityQueue pq1 = new PriorityQueue(
            new Comparator (){
                public int compare(Integer a, Integer b){
                    return b-a;
                }
            }
        );
         
        // for first two elements median is trivial
        int a = in.nextInt();
        System.out.println((double)(a));
        int b = in.nextInt();
        System.out.println((double)(a+b)/2);
        pq.add(Math.max(a,b)); pq1.add(Math.min(a,b));
        
        for(int i=2;i < s; i++){

            int n=in.nextInt(); // get next element
            //checking with max heap
            if(n < pq1.peek()){
                pq1.add(n);  // add it to max heap
            }
            else{
                pq.add(n); // add it to min heap
            }

            // Checking whether size of heaps balanced or not ?
            if(Math.abs(pq1.size() - pq.size()) > 1){
                if(pq1.size() > pq.size())
                    pq.add(pq1.poll());
                else
                    pq1.add(pq.poll());
            }

            // processing the median
            if(pq1.size() > pq.size())
                System.out.println((double)(pq1.peek()));
            else if(pq1.size() < pq.size())
                System.out.println((double)(pq.peek()));
            else
                System.out.println((double)(pq.peek()+pq1.peek())/2);

        }// for close
    }
}

Complexity:
The Complexity of the program is O(n*logn) as the for loop is running n times and the operations inside for loop, insert into heap, poll, are all logn time operation.

Comments

Popular posts from this blog

Goodness of Fit Test for normal and poisson distribution

Meaning of Goodness of fit test: We find out which distribution fits the sample data the most. And this is achieved using chi-square distribution (Snedecor and Cochran, 1989). How to apply: There are 4 steps to follow: State the hypothesis:  Data follows a distribution or not Criteria to reject null hypothesis: if Χ 2  > Χ 2 (k,1-α) then reject null hypothesis. Analyze sample data: Compute the chi-square value using below formula: ∑(Oi- Ei) 2 /Ei        : Oi is observed frequency and Ei is expected frequency Interpret the results: Declare the results after comparing the values of Χ 2 and Χ 2 (k,1-α), where k is degree of freedom and  α is significance level. Degree of Freedom: It is  =  n - 1 - m m: number of parameter in the distribution. So in case of normal distribution m is 2 ( μ ,α) and in case of poisson dist. m is = 1 ( λ). Example 1: Goodness of fit test for Normal Distribution Year wise data is given about number of car accidents, find out w

classification of database indexing

Dense Index: For every records we are having one entry in the index file. Sparse Index: Only one record for each block we will have it in the index file. Primary Index ( Primary Key + Ordered Data )  A primary index is an ordered file whose records are fixed length size with 2 fields. First field is same as primary key and the second field is pointer to the data block.  Here index entry is created for first record of each block, called ‘block anchor’. Number of index entries thus are equal to number of blocks Average number of block accesses are = logB (worst case) + 1 (best case),  So on average it will be O(logB) The type of index is called sparse index because index is not created for all the records, only the first records of every block the entry is made into the index file. Example on Primary Index Number of Records = 30000, Block Size = 1024 Bytes, Strategy = Unspanned, Record Size = 100 Bytes, Key Size = 6 Bytes, Pointer Size = 9 Bytes, Then find avera

Red Black Tree Insertion in Java

Red Black Tree is a binary search tree. With extra property with each node having color either RED or BLACK. It has following properties: Every node is either RED or BLACK Root element will have BLACK color. Every RED node has BLACK children. Every path from root to leaf will have same number of BLACK nodes. We have few theorems over RB tree: Note: Height of RB tree is θ(logn). Insertion: Insertion in RB tree is little tricky as compared to BST, we have following cases to insert a node in RB tree: When tree is null just insert the node and it will be root, color will be BLACK. To insert an element first find its position: if key <= node.key then move to left else move to right Once we find the correct place to insert, we insert the node with color RED. But if the parent of the node just inserted has color RED then it will violate 3 property. So we have to fix this issue. We call it double RED problem, we resolve it using following few cases. DOUBLE R