Stat 705 HW 16 solution ======================= cigcanc = read.table( "http://www-users.math.umd.edu/~evs/s430.old/Data/cigcanc.dat", header=T) fitmod = step(lm(LUNG ~ CIG, data=cigcanc), LUNG ~ CIG + BLAD + KID + LEUK, k=3.84) ... fitmod$call lm(formula = LUNG ~ CIG + BLAD, data = cigcanc) > c(fitmod$coef, summary(fitmod)$sig) (Intercept) CIG BLAD 4.8901756 0.3516871 1.4561620 2.9299503 (a) rho0 = 1/2.93^2 ### = 0.1165 ### Start with rho ~ Gamma( 0.5, 0.5 ) , a=1, lambda=1, ### for no particular reason. Let's take mu0 = c(1,0,0); Sigma0 = diag(10,3,3) ### [Sigma0 eigenvalues are large, compared to ### the squares of coef/sig^2.] > XtX = solve(summary(fitmod)$cov.unsc) sumY2 = sum(cigcanc$LUNG^2) sumXY = c(t(model.matrix(fitmod))%*% cigcanc$LUNG) Tv = sumXY + solve(Sigma0, mu0) RSS = sum(fitmod$resid^2) ### Recall from the BayesConjug.pdf handout that with p=3, ### that the posterior density for rho is approximately ### Gamma(0.5*(n+a+p), 0.5*(lambda+RSS)) = Gamma(24, 176.48) ### since 0.5*(1+RSS) = 176.4845. ### This Gamma posterior is an extremely peaked density very ### close to Normal (24/176.5, 48/176.5^2) = Normal(.136, 0.0393^2), ### and we can use either this Gamma or this Normal in ### figuring out further posterior-related integrals. A more ### accurate calculation of the posterior rho density using ### the exact formula given in the BayesConjug.pdf handout is ### a little difficult because of the need for rescaling in the ### integrals (to avoiid astronomically small normalizing constants, ### and would not really imporve the final answers. Then the posterior means for the coefficients can be obtained by integrating out rho in the posterior normal density for beta given rho multiplied by the rho posterior density: ## We integrate over rho ### the mean of each of the three mean-components of ### N( (XtX + Sigma0^(-1)rho)^(-1) (Tv), rho^(-1)*(XtX + Sigma^(-1)rho)^(-1) ) ### against Gamma(24, 176.48)(rho) posmean = numeric(3) for (i in 1:3) posmean[i] = integrate( function(rho) { outval = numeric(length(rho)) for (j in 1:length(rho)) outval[j]= solve(XtX+diag(rho[j]*(1/c(10,3,3))), Tv)[i]*dgamma(rho[j],24,176.48) outval}, 0, Inf)$val > posmean [1] 4.9120339 0.3516577 1.4512191 ### This Bayesian posterior mean should be compared with the least-squares > fitmod$coef (Intercept) CIG BLAD 4.8901756 0.3516871 1.4561620 (b) We could do this as a posterior bivariate integral, but it is much easier to program as a Monte Carlo simulated probability: tmpInd = numeric(1000) beta = array(0,c(1000,3)) rho = rgamma(1000, 24, 176.48) library(MASS) for(i in 1:1000) { Mtmp = solve(XtX + diag(rho[i]*(1/c(10,3,3)))) beta[i,] = mvrnorm(1, Mtmp %*% Tv, (1/rho[i])*Mtmp) tmpInd[i] = (beta[i,2] > 0.13056 & beta[i,2] < 0.57282 & beta[i,3] > 0.17891 & beta[i,3] < 2.73342) } > round( mean(tmpInd) + c(mean=0, LowerCL=-1, UpperCL=1)* 1.96*sqrt(mean(tmpInd)*(1-mean(tmpInd))/1000) , 4) mean LowerCL UpperCL 0.9310 0.9153 0.9467 ### So our point estimate is 0.9310 and 95% CI is (0.9153, 0.9467 ) ### It makes sense that the joint probability is less that 0.95: but ### if the two coefficients were independent we would expect the ### probability to be close to .95^2 = .9025. ### In fact, the correlation between the two coefficents is high: > cor(beta23[,1],beta23[,2]) [1] -0.6862057 (c) Finally, we now generate 1000 posterior datasets using the posterior simulated rho[i] and beta[i,] values, and get the corresponding absolute maximum standardized residuals. maxsres = numeric(1000) Xmod = model.matrix(fitmod) resvar.unsc = diag(44) - Xmod %*% summary(fitmod)$cov.unsc %*% t(Xmod) ### this is the 44x44 matrix always used in obtaining variances of residuals SEresid.unsc = sqrt(diag(resvar.unsc)) for(i in 1:1000) { Ynew = c(Xmod %*% beta[i,]) tmp = lm(Ynew ~ CIG + BLAD, data=cigcanc) res = tmp$resid maxsres[i] = max(abs(res)/SEresid.unsc)/summary(tmp)$sigma } > hist(maxsres, nclass=24, main= "Posterior Predicted \n Maximum Std Resid")