R Script to Compare Parametric and Nonparametric Boostrap of Median
===================================================================
This R Script continues the investigation of median estimators for
Exponential Data that was begun in RscriptLec5.RLog.
Recall: n=60 Expon(1) data, R=10,000, B=300, true med = log(2) = 0.6931472
## Nonparametric bootstrap version as done before, to which we
# add the computed array of bootstrap interval lengths.
# (Recall CI lengths are identical for Basic-pivotal and Percentile)
covrg = array(T, c(1e4,2), dimnames=list(NULL,
c("Basic", "Pctile")))
length = rep(T,1e4)
set.seed(1755)
datarr = array( rexp(60*1e4), c(1e4,60))
med.vec = apply(datarr,1,median)
bootarr = array(0, c(300, 60))
for(i in 1:1e4) {
bootarr[,] = sample(datarr[i,], 300*60, replace=T)
qboot = quantile(apply(bootarr,1,median), prob=c(.025,.975))
qmid = (qboot[1]+qboot[2])/2
qhlf = (qboot[2]-qboot[1])/2
length[i] = 2*qhlf
covrg[i,] = abs( c(2*med.vec[i]-log(2), log(2)) - qmid ) < qhlf }
> round(c(apply(covrg,2,mean), length=mean(length)),4)
Basic Pctile length
0.8447 0.9406 0.4962
## Now we move on to Parametric Bootstrap: recall that the quantile
# for Expon(lambda) sample could be calculated either as sample
# median or as log(2)*(sample.mean) -- both of "Basic-pivotal" type
PBcovrg = array(T, c(1e4,2), dimnames=list(NULL,c("samp-med","ML-med")))
PBlgth = array(0, c(1e4,2), dimnames=list(NULL,c("samp-med","ML-med")))
oldseed = .Random.seed
.Random.seed = oldseed
for(i in 1:1e4) {
mean.ML = mean(datarr[i,])
bootarr[,] = rexp(300*60,1/mean.ML)
qmed = quantile(apply(bootarr,1,median), prob=c(.025,.975))
qmid = (qmed[1]+qmed[2])/2
qhlf = (qmed[2]-qmed[1])/2
qML = quantile(apply(bootarr,1,mean), prob=c(.025,.975))
CImean = pmax(0, 2*mean.ML - qML[2:1])
### the max-likelihood CI for mean
PBlgth[i,] = c(2*qhlf, log(2)*(CImean[2]-CImean[1]))
PBcovrg[i,] = c(abs(2*med.vec[i]-log(2)-qmid) < qhlf,
CImean[1] < 1 & 1 < CImean[2]) }
> round(c(apply(PBcovrg,2,mean), apply(PBlgth,2,mean)),4)
samp-med ML-med samp-med ML-med
0.7702 0.9311 0.4948 0.3454
## OK, we can see that there is still under-coverage even in the PB version
# of the CI using the sample median (for this sample size), but the PB 95%
# CI has only very slight under-coverage.
## NOTE also that the average length of the PB sample-med CI is very much # the same, on average, as the length of the sample-med CI based on # nonparametric bootstrap. However, the PB CI based on the ML for Expon # medians is much shorter !!
WHAT DOES THEORY SAY ABOUT THE TWO CI LENGTHS, THE ONE BASED
ON SAMPLE MEDIANS AND THE OTHER ON SAMPLE MEANS USING EXPONENTIALITY.
First, we saw in class that for large n the sample median for Expon(lambda)
data is approximately |N( log(2)/lambda, (1/60)*1/(2*lambda/2)^2 ) because the
Expon(lambda) density at the median is lambda/2.
Thus, theory says the CI length (with lambda=1, n=60) based on sample
median is 2*1.96/sqrt(60) = 0.5061, which aligns closely with what we
saw both for the parametric and nonparametric bootstrap intervals.
Second, the asymptotic large-sample distribution for the median-estimator
log(2)*Xbar in the Expon(lambda) data setting is
|N(log(2)/lambda, (1/60)*(log(2)/lambda)^2 ) so that the 2-sided 95% CI width
based on this estimator is 2*1.96*(log(2))^2/sqrt(60) = 0.3508 , which again
aligns closely with the average ML-med CI from our Parametric-Bootstrap study above.