R Session for Demonstration in Lecture 1, STAT 818D.
====================================================
Eric Slud week of 1/25-29/2021
set.seed(30301)
sim.array = array(rgamma(2.5e5,2,1), c(1e4, 25))
## array 10,000 x 25 viewed as 10,000 samples of size 25
mean.vec = c(sim.array %*% rep(0.04, 25))
hist(mean.vec, nclass=40, prob=T, xlab="x values", main =
paste0("10,000 sample means of size 25, Gamma(2,1) data","\n",
"Overlaid with blue Normal density curve from CLT"))
## Mean is 2, variance is 2/25.
curve(dnorm(x,2,sqrt(.08)), c(0,3), add=T, col="blue")
## saved as : SimCLTPicLec1.pdf
## In this picture, we might see left normal tail as being slight
## and right normal tail as being slightly too small,
## but after all n=25 is not very large ...
--------------------------------------------------------
# Now consider case where we take a single sample (the 1000 row
# of the sim.array simulated data), and do 10,000 bootstrap
# samples.
boot.array = array(sample( sim.array[1000,], 2.5e5, replace=T),
c(1e4,25))
mean.boot = c(boot.array %*% rep(0.04, 25))
hist(mean.boot, nclass=40, prob=T, xlab="x values", main =
paste0("10,000 Bootstrapped size-25 sample means","\n",
"Overlaid with Normal density curves from CLT"))
curve(dnorm(x,2,sqrt(.08)), c(0,3), add=T, col="red")
## so we can see that the bootstrapped samples have means that
## are off-center and scaled slightly differently from the
## normal distribution with "true" mean 2 and variance 8/25.
## But that is not a surprise because this particular sample of 25
## variates have respective mean and variance of
> c(mean= mean(sim.array[1000,]), var=var(sim.array[1000,]))
mean var
1.9669795 2.680305
## so the sample mean has variance = 2.6803/25 = .107212
## if we overlay the same histogram with the normal density for
## these mean and variance parameters specific to the data-sample
## we find
curve(dnorm(x,1.96698,sqrt(0.107212)), c(0,3), add=T, col="blue")
legend(2.5, 1.2, legend=c("simulation mu,sig","sample mu,sig"),
lty=c(1,1), col=c("red","blue"))
## Picture saved as : BootCLTpicLec1.pdf
### We get a much closer correspondence between density and
# bootstrap histogram, revealing the same slight faults
# of the normal approximation [slightly heavy left tail and
# light right tail and perhaps middle peak of histogram
# slightly exceeding the normal approximation] that we saw
# in the Monte Carlo simulation.
### It is easy to verify the same phenomenon by repeating the
# bootstrap illustration with other rows of sim.array, eg
dataobs = sim.array[777,]
mean.boot2 = c(array(sample( dataobs, 2.5e5, replace=T),
c(1e4,25)) %*% rep(0.04, 25))
hist(mean.boot2, nclass=40, prob=T, xlab="x values", main =
paste0("10,000 Bootstrapped size-25 sample means","\n",
"Overlaid with blue Normal density curve from CLT"))
curve(dnorm(x,2,sqrt(.08)), c(0,3), add=T, col="red")
curve(dnorm(x, mean(dataobs), sd(dataobs)/5), c(0,3.2),
add=T, col="blue")
### We will see the message of these pictures reflected in
# Theorems in weeks 2 and 3 of the course.