HW13 Solutions 11/4/2016
========================

In the first part of the problem, the "full-data" or "augmented-data" 
loglikelihood is (apart from terms involving (X_i)! that do not involve parameters)

\sum_{i=1}^n [ G_i*(X_i*log(lam1)-lam1) + (1-G_i)*(X_i*log(lam2)-lam2) + 
                  G_i*log(a) + (1-G_i)*log(1-a) ]

and in the second part of the problem, the augmented-data loglikelihood is

\sum_{i=1}^n [ G_i* \sum_{j=1}^5 ( (X_{i+}*log(lam1)-5*lam1) + 
                             (1-G_i)*(X_{i+}*log(lam0)-lam0) )  +
                  G_i*log(a) + (1-G_i)*log(1-a) ]

In both parts, the G_i terms -- which we do not observe -- appear linearly, and so 
when we take the conditional expectation given the observed X-data, the only 
calculation needed under the model with parameters theta1 is  E(G_i | X).

In part (1), this formula is a lot like what you saw in Sec6NotF16.pdf : 

E(G_i | X_i) = a*lam1^(X_i)*exp(-lam1)/
           [ a*lam1^(X_i)*exp(-lam1) + (1-a)*lam2^(X_i)*exp(-lam0) ]

while in part (2) it is a little more complicated since the conditional expectation
involves all of the X_{ij}'s in the ith  cluster:

Let X_{i+} = X_{i1} + ... + X_{i5}, and then write

E(G_i | {X_{ij}: j=1,..,5} ) = a*lam1^(X_{i+})*exp(-5*lam1)/
    [ a*lam1^(X_{i+})*exp(-lam1) + (1-a)*lam0^(X_{i+})*exp(-5*lam2) ]

##=============================================================
(1) Generate a dataset of size n=300  under this model following 
    set.seed(7757), and a=.3, lambda_1 = 2, lambda_0 = 3.5.

set.seed(7757)
Gvec = rbinom(300,1,0.3)
Xvec = numeric(300)
Xvec[Gvec==1] = rpois( sum(Gvec), 2)
Xvec[Gvec==0] = rpois( 300-sum(Gvec), 3.5)

> table(Xvec)
Xvec
 0  1  2  3  4  5  6  7  8  9 
18 50 61 58 54 23 20 12  2  2 

> table(Xvec[Gvec==1])

 0  1  2  3  4  5 
11 27 26 14  9  3 
> table(Xvec[Gvec==0])

 0  1  2  3  4  5  6  7  8  9 
 7 23 35 44 45 20 20 12  2  2 
> mean(Xvec[Gvec==1])
[1] 1.911111
> mean(Xvec[Gvec==0])
[1] 3.538095
                ### This shows that the two Gvec groups are really quite 
              ### different but we are not given any direct data on 
              ### which observations go with which Gvec !!

NlogL = function(th) {
          a = plogis(th[1])
       -sum(log( a*dpois(Xvec,exp(th[2])) + (1-a)*dpois(Xvec,exp(th[3])) )) }

mlefit1 = nlm(NlogL, c(0,log(.1),log(.5)), gradtol=1.e-9, steptol=1.e-9, hessian=T)
$minimum
[1] 601.1122

$estimate
[1] -0.9220058  0.6392831  1.2554214

$gradient
[1]  0.000000e+00 -1.136868e-07 -9.055671e-08

$hessian
          [,1]      [,2]      [,3]
[1,]  10.11999 -19.53766 -62.72526
[2,] -19.53766  52.66943  77.57667
[3,] -62.72526  77.57667 574.43759

$code
[1] 1

$iterations
[1] 36

### estimates are:
> th.nlm = c(a=plogis(mlefit1$est[1]), lam1=exp(mlefit1$est[2]),
        lam0= exp(mlefit1$est[3]))
        a      lam1      lam0 
0.2845494 1.8951219 3.5093169 


Gst = function(X, th1) 
    ### recall that the parameters in this function are the old th1
   ### starting values, not the new th-values that we maximize over
       1/(1 + (1/th1[1]-1)*exp(th1[2]-th1[3])*(th1[3]/th1[2])^X)  

## The closed form EM solutions are given in the form of functions of Xvec and th1,
## as follows:

th.max = function(Xvec, th1) {
     Gstsum = sum(Gst(Xvec,th1))
     GstXsum = sum(Xvec*Gst(Xvec,th1))
     n = length(Xvec)
     c(a = Gstsum/n, lam1 = GstXsum/Gstsum, lam0 = 
           (sum(Xvec)-GstXsum)/(n-Gstsum)) }

> thval = c(.5,.1,.5)
  for(i in 1:25) {
    for(j in 1:100) thval=th.max(Xvec,thval)
    cat("on iteration ",i*100,", (a,lam1,lam0) = ",round(thval[1],5),", ",
       round(thval[2],5),", ",round(thval[3],5),"\n", sep="") }
on iteration 100, (a,lam1,lam0) = 0.17314, 1.55239, 3.36359
on iteration 200, (a,lam1,lam0) = 0.221, 1.71875, 3.42768
on iteration 300, (a,lam1,lam0) = 0.24699, 1.79584, 3.46137
on iteration 400, (a,lam1,lam0) = 0.26194, 1.8369, 3.48053
on iteration 500, (a,lam1,lam0) = 0.2708, 1.86025, 3.49184
on iteration 600, (a,lam1,lam0) = 0.27614, 1.87398, 3.49864
on iteration 700, (a,lam1,lam0) = 0.27939, 1.88223, 3.50277
on iteration 800, (a,lam1,lam0) = 0.28138, 1.88723, 3.5053
on iteration 900, (a,lam1,lam0) = 0.28261, 1.89029, 3.50685
on iteration 1000, (a,lam1,lam0) = 0.28336, 1.89216, 3.50781
on iteration 1100, (a,lam1,lam0) = 0.28382, 1.89332, 3.5084
on iteration 1200, (a,lam1,lam0) = 0.28411, 1.89403, 3.50876
on iteration 1300, (a,lam1,lam0) = 0.28429, 1.89447, 3.50899
on iteration 1400, (a,lam1,lam0) = 0.28439, 1.89474, 3.50912
on iteration 1500, (a,lam1,lam0) = 0.28446, 1.8949, 3.50921
on iteration 1600, (a,lam1,lam0) = 0.2845, 1.89501, 3.50926
on iteration 1700, (a,lam1,lam0) = 0.28453, 1.89507, 3.50929
on iteration 1800, (a,lam1,lam0) = 0.28455, 1.89511, 3.50931
on iteration 1900, (a,lam1,lam0) = 0.28455, 1.89513, 3.50933
on iteration 2000, (a,lam1,lam0) = 0.28456, 1.89515, 3.50933
on iteration 2100, (a,lam1,lam0) = 0.28456, 1.89516, 3.50934
on iteration 2200, (a,lam1,lam0) = 0.28457, 1.89516, 3.50934
on iteration 2300, (a,lam1,lam0) = 0.28457, 1.89517, 3.50934
on iteration 2400, (a,lam1,lam0) = 0.28457, 1.89517, 3.50934
on iteration 2500, (a,lam1,lam0) = 0.28457, 1.89517, 3.50935

### In case of finding estimates that are apparently not very close to the
###  true parameter values from which you simulated [which does not seem
###  to be a problem in the estimates I found], we can make 95% confidence 
###  intervals around the estimated values and see whether the true values
###  fall within those intervals. Here the confidence interval widths 
###  for the 3 parameters can be found as follows:

NlogL0 = function(x,xv)       
       -sum(log(x[1]*dpois(xv,x[2]) + (1-x[1])*dpois(xv,x[3]) )) 
mlefit2 = nlm(NlogL0, th.nlm, gradtol=1.e-9, steptol=1.e-9, hessian=T, xv=Xvec)

> round(sqrt(diag(solve(mlefit2$hess))),4)
[1] 0.3373 0.8741 0.4589      ### so this leaves a LOT of room for error!

#### Let's compare this with what we would get from the same kind of data with n=30000

> nG1 = rbinom(1,30000,0.3)    ### = 9034
  mlefit2B = nlm(NlogL0, c(.3,2,3.5), gradtol=1.e-9, steptol=1.e-9, 
         hessian=T, xv=c(rpois(nG1,2), rpois(3e4-nG1,3.5)))
$estimate
[1] 0.3041545 1.9822299 3.5131768

> round(sqrt(diag(solve(mlefit2B$hess))),5)
[1] 0.04685 0.10988 0.05898    ### roughly 1/10 as large as before, because 
                          ### sample size n is larger by factor 100

NOTE: using the DELTA METHOD, the standard errors in mlefit2 can also be derived
(approximately) from the hessian, in mlefit1, as follows:

> round(sqrt(diag(solve(mlefit2$hess))),4)
[1] 0.3373 0.8741 0.4589
> tmp = sqrt(diag(solve(mlefit1$hess)))
  round(c(dlogis(mlefit1$est[1])*tmp[1], exp(mlefit$est[2:3])*tmp[2:3]),4)
[1] 0.3356 0.8697 0.4570


##===========================================================
(2) Repeat the same problem where the data now consist of n=60 clusters 
    (X_{i,1}, ..., X_{i,5})  of 5 conditionally iid Poisson(lambda_{G_i})
    random variables, where again the parameters are a = P(G_1=1), and 
    lambda_1 < lambda_2.

set.seed(7757)
Gv2 = rbinom(60,1,.3)
th0 = c(.3,2,3.5)
Xmat = array( rpois(300, th0[3-rep(Gv2,each=5)]), c(5,60) )

> rbind(Xibar = apply(Xmat,2,mean), true=th0[3-Gv2])
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
Xibar  4.4  4.2  2.4  5.2  3.2  2.8  3.6    2  3.8   5.4   2.4   2.4     3
true   3.5  3.5  3.5  3.5  3.5  3.5  3.5    2  3.5   3.5   3.5   3.5     2
      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
Xibar   1.2   3.6   1.8   1.4   4.0   3.6   3.0   3.2   4.0   5.0   3.4   2.2
true    2.0   3.5   2.0   2.0   3.5   3.5   3.5   3.5   3.5   3.5   3.5   2.0
      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
Xibar   3.2   2.8   5.0   3.4   1.8   2.0   3.6   1.8   1.4   4.2   2.0   3.8
true    3.5   3.5   3.5   3.5   2.0   3.5   3.5   3.5   2.0   3.5   3.5   3.5
      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
Xibar   4.2   2.2   4.6   2.4   2.2   3.8   3.4   4.6   2.2   3.0   2.2   3.0
true    3.5   2.0   3.5   3.5   2.0   3.5   3.5   3.5   2.0   3.5   2.0   3.5
      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
Xibar   3.6   1.8   4.6   2.6   3.4   3.2   4.8   2.2     3     2   3.8
true    3.5   2.0   3.5   3.5   3.5   3.5   3.5   2.0     2     2   3.5


NlogL3 = function(x,xm) {
         djoint1 = apply(dpois(xm,x[2]),2,prod)
         djoint0 = apply(dpois(xm,x[3]),2,prod)     
       -sum(log(x[1]*djoint1 + (1-x[1])*djoint0 ))  }

mlefit3 = nlm(NlogL3, th0, gradtol=1.e-9, steptol=1.e-9, hessian=T, xm=Xmat)
$minimum
[1] 597.6572

$estimate
[1] 0.4173144 2.3018464 3.7574403

$gradient
[1]  9.379164e-08  4.938941e-10 -5.900009e-08

$hessian
         [,1]       [,2]       [,3]
[1,] 129.2756 -35.974304 -33.282395
[2,] -35.9743  30.137021  -2.088265
[3,] -33.2824  -2.088265  32.113320

$code
[1] 1

$iterations
[1] 14

### Now compare this to what we get with EM.

> Xsum = apply(Xmat,2,sum)
  Xtot = sum(Xsum)          ### also = sum(c(Xmat))

## note that the xsum in the next function is apply(xmat,2,sum)

Gst2 = function(xsum, th1) 
    ### recall that the parameters in this function are the old th1
   ### starting values, not the new th-values that we maximize over
       1/(1 + (1/th1[1]-1)*exp(5*(th1[2]-th1[3]))*(th1[3]/th1[2])^xsum)  

## The closed form EM solutions are given in the form of functions of Xvec and th1,
## as follows:

th.max2 = function(xsum, th1) {
     Gst2sum = sum(Gst2(xsum,th1))
     Gst2Xsum = sum(xsum*Gst2(xsum,th1))
     n = length(xsum)
     c(a = Gst2sum/n, lam1 = 0.2*Gst2Xsum/Gst2sum, lam0 = 
           0.2*(sum(Xsum)-Gst2Xsum)/(n-Gst2sum)) }

thval = c(.5,.1,.5)
  for(i in 1:10) {
    for(j in 1:20) thval=th.max2(Xsum,thval)
    cat("on iteration ",i*20,", (a,lam1,lam0) = ",round(thval[1],5),", ",
       round(thval[2],5),", ",round(thval[3],5),"\n", sep="") }
on iteration 20, (a,lam1,lam0) = 0.20158, 1.97709, 3.44614
on iteration 40, (a,lam1,lam0) = 0.399, 2.27255, 3.73254
on iteration 60, (a,lam1,lam0) = 0.41551, 2.29896, 3.755
on iteration 80, (a,lam1,lam0) = 0.41713, 2.30156, 3.75719
on iteration 100, (a,lam1,lam0) = 0.4173, 2.30182, 3.75742
on iteration 120, (a,lam1,lam0) = 0.41731, 2.30184, 3.75744
on iteration 140, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744
on iteration 160, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744
on iteration 180, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744
on iteration 200, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744

### Much quicker in the second EM version  !!!