Homework Problem 10, STAT 705, Fall 2015.

Assigned 10/14/2015,  Due Monday 10/26/2015


(1) Generate a single dataset of values (X_{i,j}, Y_{i,j}) , i=1,...,1000, 
j=1,..,40, according to the following distributional rules:

(*)               epsilon_{ij} are iid ~ N(0,1),  

and independent of the epsilon's,

        X_{ij} are independent and ~ Unif[-2,2]*sqrt(1+0.2*j)   

and
        Y_{ij} = 0.4 + 0.1*j + X_{ij} + epsilon_{ij}

Put your dataset Data1  (the X's and Y's) into a 1000 X 40 X 2 array.

    Generate a second 1000 X 40 X 2 dataset Data2 according to exactly the 
same method except that the values epsilon_{ij} are iid distributed 
according to t_5, Student's t distribution with 5 degrees of freedom.

#-------------- Solution to (1) ----------------------

Xmat1 = array( outer(rep(1,1000),sqrt(1+0.2*(1:40)),"*")*runif(40000,-2,2),c(1000,40))
Xmat2 = array( outer(rep(1,1000),sqrt(1+0.2*(1:40)),"*")*runif(40000,-2,2),c(1000,40))

Data1 = array(c(Xmat1, c(outer(rep(0.4,1000),0.1*(1:40),"+")+Xmat1)+rnorm(40000)),
          c(1000,40,2), dimnames=list(NULL, paste("Strat",1:40,sep=""),c("X","Y")))
Data2 = array(c(Xmat2, c(outer(rep(0.4,1000),0.1*(1:40),"+")+Xmat2)+rt(40000,5)),
          c(1000,40,2), dimnames=list(NULL, paste("Strat",1:40,sep=""),c("X","Y")))

#-----------------------------------------------------

(2) For both of your datasets, view the j indices as "cluster" or "stratum" 
labels, and exhibit your empirically estimated cluster means and standard 
deviations -- compared across the two datasets -- in two informative graphs 
with x-axis corresponding to the j-index.

#-------------- Solution to (2) ----------------------

## The true means follow the line: Mean(Y_{ij}) = 0.4+j/10, and the SD's the curve 
        SD(Y_{ij}) = sqrt((4/3)*(1+j/5)+1)

## So one way to display the results is to plot points for empirical stratum means 
##   (hollow diamonds) and empirical stratum means plus or minus 2*SD (solid diamonds) 
##   overplotted with red line for these "theoretical" means and blue curves for 
##   the means +/- 2*SD.

> Y1mn = apply(Data1[,,2],2,mean)
  Y1up = Y1mn + 2*apply(Data1[,,2],2,sd)
  Y1dn = Y1mn - 2*apply(Data1[,,2],2,sd)
  Y2mn = apply(Data2[,,2],2,mean)
  Y2up = Y2mn + 2*apply(Data2[,,2],2,sd)
  Y1dn = Y2mn - 2*apply(Data2[,,2],2,sd)
  plot(rep(1:40,3), c(Y1mn,Y1up,Y1dn), type="n", xlab="Stratum", ylab="Y",
    main="Stratumwise means and \n means +/- 2*SD for DataSet 1")
  points(1:40, Y1mn, pch=5)
  points(rep(1:40,2),c(Y1up,Y1dn), pch=18)
  abline(0.4,0.1, col="red")
  curve(0.4+x/10 + 2*sqrt((4/3)*(1+x/5)+1), add=T, col="blue")
  curve(0.4+x/10 - 2*sqrt((4/3)*(1+x/5)+1), add=T, col="blue")

# and a similar picture for Data2. Note that this plot serves as a useful check that
# the simulations have worked, stratumwise, as planned, although of course there is 
# a lot of  distributional detail that this does not check.
#---------------------------------------------------------


(3) For both of your datasets simulated in (1), maximize the likelihood 
for the model
                 Y_{ij} = a + mu*j + b*X_{ij} + sigma*Z_{ij}

where X_{ij} as generated above ARE observed and part of your dataset, and
where Z_{ij} ~ N(0,1) are NOT observed in your dataset. Here the unknown 
4-dimensional statistical parameter is  (a,mu,b,sigma).

#-------------- Solution to (3) ----------------------

   The situation is straightforward if we consider the conditional likelihood 
for Y given X, which is all we need because the density of X contains no parameters. 
Then the Y data are independent normals with variances sigma^2 and means depending 
on index j. Take thet = c(a,mu,b,sigma)

NlogL = function(thet, XYdat)   ### ignoring sqrt(2*pi) terms
     log(thet[4]) + (0.5/thet[4]^2)*mean(c(XYdat[,,2] - thet[3]*XYdat[,,1] - 
           outer(rep(thet[1],dim(XYdat)[1]),thet[2]*(1:dim(XYdat)[2]),"+"))^2)

MLest = function(XYdat) {
     tmpfr = cbind.data.frame(J = rep(1:dim(XYdat)[2],rep(dim(XYdat)[1],dim(XYdat)[2])), 
                       X = c(XYdat[,,1]), Y= c(XYdat[,,2]))
     lmfit = lm(Y ~ J + X, data=tmpfr)
     thet.ini = c(lmfit$coef, summary(lmfit)$sigma^2)
     MLfit = nlm(NlogL, thet.ini, hessian=T, XYdat=XYdat)
     list(thet.ini=thet.ini, MLE=MLfit$est, VCOV = solve(MLfit$hess)/
          prod(dim(XYdat)[1:2]), grad=MLfit$grad, conv=MLfit$code) }

fit1 = MLest(Data1)
$thet.ini
(Intercept)           J           X             
  0.3971008   0.1003687   0.9983921   0.9825327 

$MLE
[1] 0.3971008 0.1003687 0.9983921 0.9911907      ### NOTE that the inital least-squarea
                                          ### values coincide with the MLE in this case!
$VCOV
              [,1]          [,2]          [,3]          [,4]
[1,]  1.020255e-04 -3.778747e-06 -2.079821e-08 -1.004022e-07
[2,] -3.778747e-06  1.843309e-07  3.061453e-09  4.958903e-09
[3,] -2.079821e-08  3.061453e-09  3.593621e-06  1.323958e-09
[4,] -1.004022e-07  4.958903e-09  1.323958e-09  1.228707e-05

$grad
[1] 1.766337e-08 4.790901e-07 4.640732e-09 3.504974e-09

$conv
[1] 1

> fit2 = MLest(Data2)
$thet.ini
(Intercept)           J           X             
  0.3919594   0.1001102   0.9999029   1.6568056 

$MLE
[1] 0.3919593 0.1001102 0.9999029 1.2871213

$VCOV
              [,1]          [,2]         [,3]          [,4]
[1,]  1.720412e-04 -6.371892e-06 4.709769e-08 -1.303580e-07
[2,] -6.371892e-06  3.108240e-07 3.125109e-12  6.438182e-09
[3,]  4.709769e-08  3.125109e-12 6.039607e-06  1.611612e-09
[4,] -1.303580e-07  6.438182e-09 1.611612e-09  2.071901e-05

$grad
[1] 5.238476e-08 1.718091e-06 3.405831e-08 1.105547e-08

$conv
[1] 2    ### Convergence not quite as good, but still OK.


### Convergence not quite as good the second way. But the 
Variance-Covariance is not right in that case, because the 
model is misspecified !! A robustified version is as follows.

First we need the gradient for all of the logLik contributions,
which we put into a big (40000 x 4) matrix.

ScorMat = function(thet, XYdat) {
     Jvec = rep((1:dim(XYdat)[2]), rep(dim(XYdat)[1], dim(XYdat)[2])) 
     resid = c(XYdat[,,2] - thet[3]*XYdat[,,1]) - thet[1] -thet[2]*Jvec
     cbind(resid/thet[4]^2, resid*Jvec/thet[4]^2, 
         resid*c(XYdat[,,1])/thet[4]^2, -1/thet[4] + resid^2/thet[4]^3) }


Compare the two estimates of per-observation Fisher information first 
in fit1,   solve(VCOV*40000)  versus  t(ScorMat) %*% ScorMat :

> round(solve(fit1$VCOV*40000),4)
        [,1]     [,2]    [,3]    [,4]
[1,]  1.0179  20.8660 -0.0119 -0.0001
[2,] 20.8660 563.3823 -0.3592 -0.0568
[3,] -0.0119  -0.3592  6.9570 -0.0007
[4,] -0.0001  -0.0568 -0.0007  2.0347

> tmp = ScorMat(fit1$MLE,Data1)
  round(t(tmp)%*% tmp/40000,4)
        [,1]     [,2]    [,3]    [,4]
[1,]  1.0179  20.8057 -0.0073 -0.0097
[2,] 20.8057 560.5703 -0.3053 -0.0984
[3,] -0.0073  -0.3053  6.9842  0.0722
[4,] -0.0097  -0.0984  0.0722  2.0640

### So in properly specified model, fit1 gives the same per-observation Fisher 
information matrix two different ways. But in the misspecifed case of fit2, we get

> round(solve(fit2$VCOV*40000),4)
        [,1]     [,2]    [,3]    [,4]
[1,]  0.6036  12.3741 -0.0047  0.0000
[2,] 12.3741 334.1017 -0.0967 -0.0260
[3,] -0.0047  -0.0967  4.1394 -0.0003
[4,]  0.0000  -0.0260 -0.0003  1.2066

tmp = ScorMat(fit2$MLE,Data2)
round(t(tmp)%*% tmp/40000,4) 
        [,1]     [,2]    [,3]    [,4]
[1,]  0.6036  12.3708 -0.0011 -0.0016
[2,] 12.3708 334.0973  0.0592 -0.2144
[3,] -0.0011   0.0592  4.0552 -0.0453
[4,] -0.0016  -0.2144 -0.0453  3.3713

## These are close in upper-left and some other cells, but not throughout.
## The "sandwich" estimator for 40000*(Variance-Covariance) of estimators is

> round(fit2$VCOV %*% (t(tmp) %*% tmp) %*% fit2$VCOV * 40000,5)
         [,1]     [,2]     [,3]     [,4]
[1,]  6.89285 -0.25535 -0.00170  0.01612
[2,] -0.25535  0.01245  0.00024 -0.00089
[3,] -0.00170  0.00024  0.23667 -0.00883
[4,]  0.01612 -0.00089 -0.00883  2.31550

which is only roughly similar to the VCOV*40000 matrix in fit1$VCOV 

> round(fit1$VCOV*40000,5)
         [,1]     [,2]     [,3]     [,4]
[1,]  4.08102 -0.15115 -0.00083 -0.00402
[2,] -0.15115  0.00737  0.00012  0.00020
[3,] -0.00083  0.00012  0.14374  0.00005
[4,] -0.00402  0.00020  0.00005  0.49148