Skip to main content

Goodness of Fit Test for normal and poisson distribution

Meaning of Goodness of fit test:

We find out which distribution fits the sample data the most. And this is achieved using chi-square distribution (Snedecor and Cochran, 1989).

How to apply:
There are 4 steps to follow:
  1. State the hypothesis: Data follows a distribution or not
  2. Criteria to reject null hypothesis: if Χ2 > Χ2(k,1-α) then reject null hypothesis.
  3. Analyze sample data: Compute the chi-square value using below formula:
    • ∑(Oi- Ei)2/Ei        : Oi is observed frequency and Ei is expected frequency
  4. Interpret the results: Declare the results after comparing the values of Χ2 and Χ2(k,1-α), where k is degree of freedom and α is significance level.
Degree of Freedom:
It is  = n - 1 - m
m: number of parameter in the distribution. So in case of normal distribution m is 2 (μ,α) and in case of poisson dist. m is = 1 (λ).



Example 1: Goodness of fit test for Normal Distribution

Year wise data is given about number of car accidents, find out whether given data follows normal distribution,  α is 5% ? (In question only first two columns will be given). Sample size  = 12

Answer:

Step 1: Stating Hypothesis

Null Hypothesis(H0): Data follows normal distribution
Alternative Hypothesis(Ha): Data do not follow normal distribution

Step 2: Criteria to reject null hypothesis:
if Χ2 > Χ2(k,1-α) then reject null hypothesis.

Step 3: Analyze sample data: 
Compute the last 4 columns of the given table.


YEAROi       EiOi - Ei(Oi − Ei)^2(Oi − Ei)^2/Ei
1978164    146.417.6309.762.116
1979142    146.4-4.419.360.132
1980153    146.46.643.560.298
1981171    146.4 24.6605.164.134
1982171    146.424.6605.164.134
1983148    146.41.62.560.017
1984136    146.4-10.4108.160.739
1985133    146.4-13.4179.561.227
1986138    146.4-8.470.560.482
1987132    146.4-14.4207.361.416
1988145    146.4-1.41.960.013
1989124    146.4-22.4501.763.427

Rest of the columns are computed as:
Ei = Total(ΣOi)/sample size = 1757 / 12 = 146.4 and rest are obvious
Sum of last column (Χ2)= 18.135

Now find out value of Χ2(k,1-α) from table where k is 12 - 1 - 2 = 9 and 1 - α = 0.95, the value is 16.92 highlighted in below table, (it will be provided in the exam).



Step 4: Interpret the results
As we can see that  Χ2 > Χ2(k,1-α) is true so we will reject the null hypothesis and declare that given sample data do not follow the normal distribution.


Example 2: Goodness of fit test for Poisson Distribution

Number of arrivals per minute at a bank located in the central business district of a city. Suppose that the actual arrivals per minute were observed in 200 one-minute periods over the course of a week. The results are summarized in Table below, find out whether the given data follows a Poisson distribution or not (α = 5%) ?, Expected Frequency should be > 1.

ARRIVALSFREQUENCY
014
131
247
341
429
521
610
75
82
9 or more0


Answer:

Step 1: 
Stating Hypothesis

H0: The number of arrivals per minute follows a Poisson distribution
H1: The number of arrivals per minute does not follow a Poisson distribution

Step 2: Criteria to reject null hypothesis:
if Χ2 > Χ2(k,1-α) then reject null hypothesis.

Step 3: Analyze sample data: 
Since the Poisson distribution has one parameter, its mean λ which can be computed from data given using the below formula:

                           X-bar =  (∑(fi * mi))/∑fi


So when we compute about value from the data we get λ = (X-bar) = 580/200 = 2.90


ARRIVALSFREQUENCYfi*mi
0140
13131
24794
341123
429116
521105
61060
7535
8216
9 or more00
Total200580

Find the probabilities from the tables of the Poisson distribution table. Frequency of X successes
(X = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or more) can be determined.
The theoretical frequency for each is obtained by multiplying the appropriate Poisson probability by the sample size n. These results are summarized in Table below: (n=200)

For poisson distribution use formula : P(X=x) = (ei)/i!

n=200 (Given in the example)

ARRIVALSFREQUENCYPROBABILITY, P (X ), FOR poisson distribution with lambda=2.9THEORETICAL FREQUENCY = n*P(X)
0140.05511
1310.159631.92
2470.231446.28
3410.223744.74
4290.162232.44
5210.09418.8
6100.04559.1
750.01883.76
820.00681.36
9 or more00.0030.6


Observe from Table  above that the theoretical frequency of 9 or more arrivals is less than 1.0.
In order to have all categories contain a frequency of 1.0 or greater, the category 9 or more is combined with the category of 8 arrivals as below:

ARRIVALSFREQUENCYPROBABILITY, P (X ), FOR poisson distribution with lambda=2.9THEORETICAL FREQUENCY = n*P(X)
0140.05511
1310.159631.92
2470.231446.28
3410.223744.74
4290.162232.44
5210.09418.8
6100.04559.1
750.01883.76
820.00681.96


Now we will apply chi-square test for determining whether the data follow Poisson probability distribution is computed using below formula:

                             ∑(Oi - Ei)2/Ei        : Oi is observed frequency and Ei is expected frequency

k (degree of freedom) = n - 1 - m = 9 - 1 - 1 = 7,
Why n is 9 cause we have arrivals (0-8), we have combined 9 or more to 8 to have all the theoretical frequencies > 1
And m is 1 as Poisson distribution has only 1 parameter that is  λ.


ARRIVALSFREQUENCY (Observed)PROBABILITY, P (X ), FOR poisson distribution with lambda=2.9THEORETICAL FREQUENCY = n*P(X)Oi - Ei(Oi - Ei)^2(Oi - Ei)^2/Ei
0140.055113.0090.818181818
1310.159631.92-0.920.84640.026516291
2470.231446.280.720.51840.011201383
3410.223744.74-3.7413.98760.312641931
4290.162232.44-3.4411.83360.364784217
5210.09418.82.204.840.257446809
6100.04559.10.900.810.089010989
750.01883.761.241.53760.40893617
820.00681.960.040.00160.000816327
Total2.28954

Now find out value of Χ2(k,1-α) from table where k is 9 - 1 - 1 = 7 and 1 - α = 0.95, the value is 14.07 highlighted in below table, (it will be provided in the exam).




Step 4: Interpret the results
since χ2 = 2.28954 < 14.07, So the decision is accept H0.

There is insufficient evidence to conclude that the arrivals per minute do not fit a Poisson distribution or fit a Poisson Distribution.

Example 3:
The manager of a computer network has collected data on the number of times that service has been interrupted on each day over the past 500 days. The results are as follows:

INTERRUPTIONS PER DAYNUMBER OF DAYS
0160
1175
286
341
418
512
68
Total500
Does the distribution of service interruptions follow a Poisson distribution? (Use the 0.01 level of significance.)


Example 4:
A random sample of 500 long distance telephone calls revealed the following distribution of call length (in minutes)

Length in MinutesFrequency
0–under 548
5–under 1084
10–under 15164
15–under 20126
20–under 2550
25–under 3028
Total500
At the 0.05 level of significance, does call length follow a normal distribution?

Comments

Post a Comment

Popular posts from this blog

classification of database indexing

Dense Index: For every records we are having one entry in the index file. Sparse Index: Only one record for each block we will have it in the index file. Primary Index ( Primary Key + Ordered Data )  A primary index is an ordered file whose records are fixed length size with 2 fields. First field is same as primary key and the second field is pointer to the data block.  Here index entry is created for first record of each block, called ‘block anchor’. Number of index entries thus are equal to number of blocks Average number of block accesses are = logB (worst case) + 1 (best case),  So on average it will be O(logB) The type of index is called sparse index because index is not created for all the records, only the first records of every block the entry is made into the index file. Example on Primary Index Number of Records = 30000, Block Size = 1024 Bytes, Strategy = Unspanned, Record Size = 100 Bytes, Key Size = 6 Bytes, Pointer Size = 9 Bytes, Then find avera

Red Black Tree Insertion in Java

Red Black Tree is a binary search tree. With extra property with each node having color either RED or BLACK. It has following properties: Every node is either RED or BLACK Root element will have BLACK color. Every RED node has BLACK children. Every path from root to leaf will have same number of BLACK nodes. We have few theorems over RB tree: Note: Height of RB tree is θ(logn). Insertion: Insertion in RB tree is little tricky as compared to BST, we have following cases to insert a node in RB tree: When tree is null just insert the node and it will be root, color will be BLACK. To insert an element first find its position: if key <= node.key then move to left else move to right Once we find the correct place to insert, we insert the node with color RED. But if the parent of the node just inserted has color RED then it will violate 3 property. So we have to fix this issue. We call it double RED problem, we resolve it using following few cases. DOUBLE R