Homework Problem 10, STAT 705, Fall 2015. Assigned 10/14/2015, Due Monday 10/26/2015 (1) Generate a single dataset of values (X_{i,j}, Y_{i,j}) , i=1,...,1000, j=1,..,40, according to the following distributional rules: (*) epsilon_{ij} are iid ~ N(0,1), and independent of the epsilon's, X_{ij} are independent and ~ Unif[-2,2]*sqrt(1+0.2*j) and Y_{ij} = 0.4 + 0.1*j + X_{ij} + epsilon_{ij} Put your dataset Data1 (the X's and Y's) into a 1000 X 40 X 2 array. Generate a second 1000 X 40 X 2 dataset Data2 according to exactly the same method except that the values epsilon_{ij} are iid distributed according to t_5, Student's t distribution with 5 degrees of freedom. #-------------- Solution to (1) ---------------------- Xmat1 = array( outer(rep(1,1000),sqrt(1+0.2*(1:40)),"*")*runif(40000,-2,2),c(1000,40)) Xmat2 = array( outer(rep(1,1000),sqrt(1+0.2*(1:40)),"*")*runif(40000,-2,2),c(1000,40)) Data1 = array(c(Xmat1, c(outer(rep(0.4,1000),0.1*(1:40),"+")+Xmat1)+rnorm(40000)), c(1000,40,2), dimnames=list(NULL, paste("Strat",1:40,sep=""),c("X","Y"))) Data2 = array(c(Xmat2, c(outer(rep(0.4,1000),0.1*(1:40),"+")+Xmat2)+rt(40000,5)), c(1000,40,2), dimnames=list(NULL, paste("Strat",1:40,sep=""),c("X","Y"))) #----------------------------------------------------- (2) For both of your datasets, view the j indices as "cluster" or "stratum" labels, and exhibit your empirically estimated cluster means and standard deviations -- compared across the two datasets -- in two informative graphs with x-axis corresponding to the j-index. #-------------- Solution to (2) ---------------------- ## The true means follow the line: Mean(Y_{ij}) = 0.4+j/10, and the SD's the curve SD(Y_{ij}) = sqrt((4/3)*(1+j/5)+1) ## So one way to display the results is to plot points for empirical stratum means ## (hollow diamonds) and empirical stratum means plus or minus 2*SD (solid diamonds) ## overplotted with red line for these "theoretical" means and blue curves for ## the means +/- 2*SD. > Y1mn = apply(Data1[,,2],2,mean) Y1up = Y1mn + 2*apply(Data1[,,2],2,sd) Y1dn = Y1mn - 2*apply(Data1[,,2],2,sd) Y2mn = apply(Data2[,,2],2,mean) Y2up = Y2mn + 2*apply(Data2[,,2],2,sd) Y1dn = Y2mn - 2*apply(Data2[,,2],2,sd) plot(rep(1:40,3), c(Y1mn,Y1up,Y1dn), type="n", xlab="Stratum", ylab="Y", main="Stratumwise means and \n means +/- 2*SD for DataSet 1") points(1:40, Y1mn, pch=5) points(rep(1:40,2),c(Y1up,Y1dn), pch=18) abline(0.4,0.1, col="red") curve(0.4+x/10 + 2*sqrt((4/3)*(1+x/5)+1), add=T, col="blue") curve(0.4+x/10 - 2*sqrt((4/3)*(1+x/5)+1), add=T, col="blue") # and a similar picture for Data2. Note that this plot serves as a useful check that # the simulations have worked, stratumwise, as planned, although of course there is # a lot of distributional detail that this does not check. #--------------------------------------------------------- (3) For both of your datasets simulated in (1), maximize the likelihood for the model Y_{ij} = a + mu*j + b*X_{ij} + sigma*Z_{ij} where X_{ij} as generated above ARE observed and part of your dataset, and where Z_{ij} ~ N(0,1) are NOT observed in your dataset. Here the unknown 4-dimensional statistical parameter is (a,mu,b,sigma). #-------------- Solution to (3) ---------------------- The situation is straightforward if we consider the conditional likelihood for Y given X, which is all we need because the density of X contains no parameters. Then the Y data are independent normals with variances sigma^2 and means depending on index j. Take thet = c(a,mu,b,sigma) NlogL = function(thet, XYdat) ### ignoring sqrt(2*pi) terms log(thet[4]) + (0.5/thet[4]^2)*mean(c(XYdat[,,2] - thet[3]*XYdat[,,1] - outer(rep(thet[1],dim(XYdat)[1]),thet[2]*(1:dim(XYdat)[2]),"+"))^2) MLest = function(XYdat) { tmpfr = cbind.data.frame(J = rep(1:dim(XYdat)[2],rep(dim(XYdat)[1],dim(XYdat)[2])), X = c(XYdat[,,1]), Y= c(XYdat[,,2])) lmfit = lm(Y ~ J + X, data=tmpfr) thet.ini = c(lmfit$coef, summary(lmfit)$sigma^2) MLfit = nlm(NlogL, thet.ini, hessian=T, XYdat=XYdat) list(thet.ini=thet.ini, MLE=MLfit$est, VCOV = solve(MLfit$hess)/ prod(dim(XYdat)[1:2]), grad=MLfit$grad, conv=MLfit$code) } fit1 = MLest(Data1) $thet.ini (Intercept) J X 0.3971008 0.1003687 0.9983921 0.9825327 $MLE [1] 0.3971008 0.1003687 0.9983921 0.9911907 ### NOTE that the inital least-squarea ### values coincide with the MLE in this case! $VCOV [,1] [,2] [,3] [,4] [1,] 1.020255e-04 -3.778747e-06 -2.079821e-08 -1.004022e-07 [2,] -3.778747e-06 1.843309e-07 3.061453e-09 4.958903e-09 [3,] -2.079821e-08 3.061453e-09 3.593621e-06 1.323958e-09 [4,] -1.004022e-07 4.958903e-09 1.323958e-09 1.228707e-05 $grad [1] 1.766337e-08 4.790901e-07 4.640732e-09 3.504974e-09 $conv [1] 1 > fit2 = MLest(Data2) $thet.ini (Intercept) J X 0.3919594 0.1001102 0.9999029 1.6568056 $MLE [1] 0.3919593 0.1001102 0.9999029 1.2871213 $VCOV [,1] [,2] [,3] [,4] [1,] 1.720412e-04 -6.371892e-06 4.709769e-08 -1.303580e-07 [2,] -6.371892e-06 3.108240e-07 3.125109e-12 6.438182e-09 [3,] 4.709769e-08 3.125109e-12 6.039607e-06 1.611612e-09 [4,] -1.303580e-07 6.438182e-09 1.611612e-09 2.071901e-05 $grad [1] 5.238476e-08 1.718091e-06 3.405831e-08 1.105547e-08 $conv [1] 2 ### Convergence not quite as good, but still OK. ### Convergence not quite as good the second way. But the Variance-Covariance is not right in that case, because the model is misspecified !! A robustified version is as follows. First we need the gradient for all of the logLik contributions, which we put into a big (40000 x 4) matrix. ScorMat = function(thet, XYdat) { Jvec = rep((1:dim(XYdat)[2]), rep(dim(XYdat)[1], dim(XYdat)[2])) resid = c(XYdat[,,2] - thet[3]*XYdat[,,1]) - thet[1] -thet[2]*Jvec cbind(resid/thet[4]^2, resid*Jvec/thet[4]^2, resid*c(XYdat[,,1])/thet[4]^2, -1/thet[4] + resid^2/thet[4]^3) } Compare the two estimates of per-observation Fisher information first in fit1, solve(VCOV*40000) versus t(ScorMat) %*% ScorMat : > round(solve(fit1$VCOV*40000),4) [,1] [,2] [,3] [,4] [1,] 1.0179 20.8660 -0.0119 -0.0001 [2,] 20.8660 563.3823 -0.3592 -0.0568 [3,] -0.0119 -0.3592 6.9570 -0.0007 [4,] -0.0001 -0.0568 -0.0007 2.0347 > tmp = ScorMat(fit1$MLE,Data1) round(t(tmp)%*% tmp/40000,4) [,1] [,2] [,3] [,4] [1,] 1.0179 20.8057 -0.0073 -0.0097 [2,] 20.8057 560.5703 -0.3053 -0.0984 [3,] -0.0073 -0.3053 6.9842 0.0722 [4,] -0.0097 -0.0984 0.0722 2.0640 ### So in properly specified model, fit1 gives the same per-observation Fisher information matrix two different ways. But in the misspecifed case of fit2, we get > round(solve(fit2$VCOV*40000),4) [,1] [,2] [,3] [,4] [1,] 0.6036 12.3741 -0.0047 0.0000 [2,] 12.3741 334.1017 -0.0967 -0.0260 [3,] -0.0047 -0.0967 4.1394 -0.0003 [4,] 0.0000 -0.0260 -0.0003 1.2066 tmp = ScorMat(fit2$MLE,Data2) round(t(tmp)%*% tmp/40000,4) [,1] [,2] [,3] [,4] [1,] 0.6036 12.3708 -0.0011 -0.0016 [2,] 12.3708 334.0973 0.0592 -0.2144 [3,] -0.0011 0.0592 4.0552 -0.0453 [4,] -0.0016 -0.2144 -0.0453 3.3713 ## These are close in upper-left and some other cells, but not throughout. ## The "sandwich" estimator for 40000*(Variance-Covariance) of estimators is > round(fit2$VCOV %*% (t(tmp) %*% tmp) %*% fit2$VCOV * 40000,5) [,1] [,2] [,3] [,4] [1,] 6.89285 -0.25535 -0.00170 0.01612 [2,] -0.25535 0.01245 0.00024 -0.00089 [3,] -0.00170 0.00024 0.23667 -0.00883 [4,] 0.01612 -0.00089 -0.00883 2.31550 which is only roughly similar to the VCOV*40000 matrix in fit1$VCOV > round(fit1$VCOV*40000,5) [,1] [,2] [,3] [,4] [1,] 4.08102 -0.15115 -0.00083 -0.00402 [2,] -0.15115 0.00737 0.00012 0.00020 [3,] -0.00083 0.00012 0.14374 0.00005 [4,] -0.00402 0.00020 0.00005 0.49148