Homework Solutions to Problems 5--6. =================================== [NB solutions are in Splus 6.0] (5) Recall from class that in order to achieve precision delta with probability at least 1-alpha, we apply the formula B > = (qnorm(1-alpha/2)^2 * sigst^2 / delta^2 where sigst is any upper bound or estimate for the standard deviation of the simulated iid variables. (a) Since alpha=delta=1e-2, and since the variables X are uniformly bounded between 0 and sqrt(3), so that sigst <= 3/4, our formula gives B >= 2.576^2*.75/1.e-4 = 49768.32. So we use B=5e4. > tmpvec <- runif(5e4) tmpvec <- sqrt(1+tmpvec+tmpvec^4) sigest <- sqrt(var(tmpvec)) c(mean(tmpvec) + c(-1,1)*qnorm(.995)*sigest/sqrt(5e4), sigest/0.75) [1] 1.2878769 1.2924819 0.2665071 ### So the estimated integral is 1.2902, with precision a bit better ### than required. The exact answer is given by: > tmpl <- integrate(function(x) sqrt(1+x+x^4),0,1) c(tmpl$integral, tmpl$abs.error) [1] 1.288417e+00 1.066592e-11 ### OK, our error was +.00176 (b) Note that the r.v.'s of interest are indicators of whether or not ## there are k distinct values in a given simulation. These are ## all binary (ie Bernoulli) so have sigsq <= 0.25, so for ## alpha = .02/6, get the formula B >= qnorm(1-.02/12)^2*.25/1.e-4 ## = 21538.5 . (We divide by 6 --- using `Bonferroni's ## inequality' --- to ensure that the union of the six events each ## of which may have probability up to .01/6, will altogether have ## probability <= .01. So we can use B>=21540; I used B=25500. > tmpmat <- array(0, dim=c(25500,10)) for(i in 1:6) { tmpvec <- sample(1:10,25500, replace=T) tmpmat[cbind(1:25500,tmpvec)] <- 1 } dvec <- c(tmpmat %*% rep(1,10)) > round(table(dvec)/25500, 4) 2 3 4 5 6 0.0029 0.0629 0.3275 0.4593 0.1474 ### Compare with true analytically calculated values: 0.0028 0.0648 0.3276 0.4536 0.1512 ### NB. The probability that only 1 distinct value occurs = 1e-5. ### The largest absolute error turns out to have been: .0057 ### which indicates that our Bonferroni-inequality calculation ### was not terribly conservative. *** Reasoning for exact calculation: for example, P(5 distinct) = 1e-6*( 10*{6 choose 2}*9*8*7*6) =.4536. *** The others can be done similarly (with more distinct cases). (6) > cathed <- read.table(file="Cathed.dat") > motif() > plot(cathed[,4], cathed[,3], type="n", xlab="Length (feet)", ylab="Height (feet)", main= "English Cathedral Heights vs Lengths, by Style") points(cathed[cathed[,2]==0,4],cathed[cathed[,2]==0,3], pch=5) points(cathed[cathed[,2]==1,4],cathed[cathed[,2]==1,3], pch=18) tm0 <- lm(V3 ~ V4, data=cathed[cathed[,2]==0,])$coef abline(tm0[1],tm0[2], lty=3) tm1 <- lm(V3 ~ V4, data=cathed[cathed[,2]==1,])$coef abline(tm1[1],tm1[2], lty=1) legend(locator(), legend=c("Romanesque","Gothic"), marks=c(5,18), lty=c(3,1)) printgraph(file="Graphic2.ps")