Homework Solutions to Problems 5--6.
===================================
[NB solutions are in Splus 6.0]

(5) Recall from class that in order to achieve precision delta with
probability at least 1-alpha, we apply the formula 
     B > = (qnorm(1-alpha/2)^2 * sigst^2 / delta^2
where sigst is any upper bound or estimate for the standard deviation
of the simulated iid variables.

(a) Since alpha=delta=1e-2, and since the variables X are
uniformly bounded between 0 and sqrt(3), so that sigst <= 3/4, our 
formula gives B >= 2.576^2*.75/1.e-4 = 49768.32.  So we use B=5e4.

> tmpvec <- runif(5e4)
  tmpvec <- sqrt(1+tmpvec+tmpvec^4)
  sigest <- sqrt(var(tmpvec))
  c(mean(tmpvec) + c(-1,1)*qnorm(.995)*sigest/sqrt(5e4), sigest/0.75)
[1] 1.2878769 1.2924819 0.2665071
### So the estimated integral is 1.2902, with precision a bit better
###   than required. The exact answer is given by:
> tmpl <- integrate(function(x) sqrt(1+x+x^4),0,1)
  c(tmpl$integral, tmpl$abs.error)
[1] 1.288417e+00 1.066592e-11		### OK, our error was +.00176

(b) Note that the r.v.'s of interest are indicators of whether or not
##   there are k distinct values in a given simulation. These are 
##   all binary (ie Bernoulli) so have sigsq <= 0.25, so for 
##   alpha = .02/6, get the formula B >= qnorm(1-.02/12)^2*.25/1.e-4
##   = 21538.5 .   (We divide by 6 --- using `Bonferroni's 
##   inequality' --- to ensure that the union of the six events each
##   of which may have probability up to .01/6, will altogether have 
##   probability <= .01. So we can use  B>=21540; I used B=25500.
> tmpmat <- array(0, dim=c(25500,10))
    for(i in 1:6) {
     tmpvec <- sample(1:10,25500, replace=T)
     tmpmat[cbind(1:25500,tmpvec)] <- 1 }
  dvec <- c(tmpmat %*% rep(1,10))
> round(table(dvec)/25500, 4)
      2      3      4      5      6 
 0.0029 0.0629 0.3275 0.4593 0.1474
### Compare with true analytically calculated values:
 0.0028 0.0648 0.3276 0.4536 0.1512
### NB. The probability that only 1 distinct value occurs = 1e-5.

### The largest absolute error turns out to have been: .0057
### which indicates that our Bonferroni-inequality calculation
### was not terribly conservative.

*** Reasoning for exact calculation: for example, 
P(5 distinct) = 1e-6*( 10*{6 choose 2}*9*8*7*6)  =.4536.
*** The others can be done similarly (with more distinct cases).

(6)
> cathed <- read.table(file="Cathed.dat")
> motif()
> plot(cathed[,4], cathed[,3], type="n", xlab="Length (feet)",
    ylab="Height (feet)", main=
    "English Cathedral Heights vs Lengths, by Style")
  points(cathed[cathed[,2]==0,4],cathed[cathed[,2]==0,3], pch=5)
  points(cathed[cathed[,2]==1,4],cathed[cathed[,2]==1,3], pch=18)
  tm0 <- lm(V3 ~ V4, data=cathed[cathed[,2]==0,])$coef
  abline(tm0[1],tm0[2], lty=3)
  tm1 <- lm(V3 ~ V4, data=cathed[cathed[,2]==1,])$coef
  abline(tm1[1],tm1[2], lty=1)
  legend(locator(), legend=c("Romanesque","Gothic"),
       marks=c(5,18), lty=c(3,1))
  printgraph(file="Graphic2.ps")