Sunday, 29 December 2019

Hypothesis Testing

> setwd('D:/Training_Material/Book/Files')
> getwd()
[1] "D:/Training_Material/Book/Files"
> Agedf<-read.csv("PlayersAge.csv",header = TRUE)
> AgePlr <- as.numeric(Agedf$Age)
> library(MASS)
> fitdistr(AgePlr,"Normal")
      mean          sd   
  28.9014025    4.6278804
 ( 0.1733155) ( 0.1225526)
> a.teo<-rnorm(n=713,mean=29,sd=4.6)
> qqplot(AgePlr,a.teo,main="QQ-plot for Normal distribution")

> abline(0,1)
It can be observed that the Age of cricketers follow Normal distribution.



Lower Tail Test of Population Mean with Known Variance
Problem:  Suppose mean age of a player is more than 32 in a sample of 713 players; Assume the population standard deviation is 5. At .05 significance level, can we reject the claim that the average age is more than 32 ?
Solution
# The null hypothesis is that mu > 32.
> xbar = 29 # sample mean
> mu0  = 32              # hypothesized value
> sigma= 5               # population standard deviation
> n = 713                 # sample size
> z = (xbar-mu0)/(sigma/sqrt(n))
> z                      # test statistic
[1] -16.02124
Øcompute the critical value at .05 significance level.
> alpha = .05
> z.alpha = qnorm(1-alpha)
> -z.alpha               # critical value
[1] -1.644854
§The test statistic -16.02124 is less than the critical value of -1.6449. Hence, at .05 significance level, we reject the claim “mean age of a player is more than 32“
> pval = pnorm(z)
> pval                   # lower tail p-value

§[1] 4.541344e-58 

Two-Tailed Test of Population Mean with Known Variance
Problem:  Suppose mean age of a player is  equal to 30 in a sample of 713 players; Assume the population standard deviation is 5. At .05 significance level, can we reject the claim that the average age is 30?

Solution: # The null hypothesis is that mu = 32.
> xbar = 28.9 # sample mean
> mu0  = 30              # hypothesized value
> sigma= 5               # population standard deviation
> n = 713                 # sample size
> z = (xbar-mu0)/(sigma/sqrt(n))
> z                      # test statistic
[1] -5.874453
compute the critical value at .05 significance level.
> alpha = .05
> z.half.alpha = qnorm(1-alpha/2)
> c(-z.half.alpha, z.half.alpha)
[1] -1.959964  1.959964
§The test statistic -5.874453 is not between the critical values -1.9600 and 1.9600. Hence, at .05 significance level, we  reject the null hypothesis that the “mean age of a player is equal to 32“
> pval = 2 * pnorm(z)    # lower tail
> pval                   # two-tailed p-value
[1] 4.242414e-09

Two-Tailed Test of Population Mean with Unknown Variance
Problem:  Suppose mean age of a player is  equal to 30 in a sample of 713 players and population standard deviation is unknown; Can we reject the claim that the average age is 29.5 at .05 significance level?
Solution: # The null hypothesis is that mu = 30.
> getwd()
[1] "D:/Training_Material/Book/Files"
> Agedf<-read.csv("PlayersAge.csv",header = TRUE)
> AgePlr <- as.numeric(Agedf$Age)
> library(fBasics)
> basicStats(AgePlr)
> basicStats(AgePlr)
 AgePlr
nobs          713.000000
NAs             0.000000
Minimum        17.200000
Maximum        45.400000
1. Quartile    25.500000
3. Quartile    31.800000
Mean           28.901403
Median         28.600000
Sum         20606.700000
SE Mean         0.173437
LCL Mean       28.560893
UCL Mean       29.241912
Variance       21.447358
Stdev           4.631129
Skewness        0.389330
Kurtosis        0.173905
> xbar= 28.901403       # sample mean
> mu0 = 30              # hypothesized value
> s   = 4.631129        # sample standard deviation
> n   = 713             # sample size
> t   = (xbar−mu0)/(s/sqrt(n))
> t                     # test statistic
[1] -6.334266
> alpha = .05
> t.half.alpha = qt(1−alpha/2, df=n−1)
> c(−t.half.alpha, t.half.alpha)
[1] -1.963301  1.963301
The test statistic -6.334266 is not between the critical values -1.9600 and 1.9600. Hence, at .05 significance level, we  reject the null hypothesis that the “mean age of a player is equal to 3o“
> pval = 2  pt(t, df=n−1)  # lower tail
> pval                      # two−tailed p−value

[1] 4.225286e-10