HW13 Solutions 11/4/2016 ======================== In the first part of the problem, the "full-data" or "augmented-data" loglikelihood is (apart from terms involving (X_i)! that do not involve parameters) \sum_{i=1}^n [ G_i*(X_i*log(lam1)-lam1) + (1-G_i)*(X_i*log(lam2)-lam2) + G_i*log(a) + (1-G_i)*log(1-a) ] and in the second part of the problem, the augmented-data loglikelihood is \sum_{i=1}^n [ G_i* \sum_{j=1}^5 ( (X_{i+}*log(lam1)-5*lam1) + (1-G_i)*(X_{i+}*log(lam0)-lam0) ) + G_i*log(a) + (1-G_i)*log(1-a) ] In both parts, the G_i terms -- which we do not observe -- appear linearly, and so when we take the conditional expectation given the observed X-data, the only calculation needed under the model with parameters theta1 is E(G_i | X). In part (1), this formula is a lot like what you saw in Sec6NotF16.pdf : E(G_i | X_i) = a*lam1^(X_i)*exp(-lam1)/ [ a*lam1^(X_i)*exp(-lam1) + (1-a)*lam2^(X_i)*exp(-lam0) ] while in part (2) it is a little more complicated since the conditional expectation involves all of the X_{ij}'s in the ith cluster: Let X_{i+} = X_{i1} + ... + X_{i5}, and then write E(G_i | {X_{ij}: j=1,..,5} ) = a*lam1^(X_{i+})*exp(-5*lam1)/ [ a*lam1^(X_{i+})*exp(-lam1) + (1-a)*lam0^(X_{i+})*exp(-5*lam2) ] ##============================================================= (1) Generate a dataset of size n=300 under this model following set.seed(7757), and a=.3, lambda_1 = 2, lambda_0 = 3.5. set.seed(7757) Gvec = rbinom(300,1,0.3) Xvec = numeric(300) Xvec[Gvec==1] = rpois( sum(Gvec), 2) Xvec[Gvec==0] = rpois( 300-sum(Gvec), 3.5) > table(Xvec) Xvec 0 1 2 3 4 5 6 7 8 9 18 50 61 58 54 23 20 12 2 2 > table(Xvec[Gvec==1]) 0 1 2 3 4 5 11 27 26 14 9 3 > table(Xvec[Gvec==0]) 0 1 2 3 4 5 6 7 8 9 7 23 35 44 45 20 20 12 2 2 > mean(Xvec[Gvec==1]) [1] 1.911111 > mean(Xvec[Gvec==0]) [1] 3.538095 ### This shows that the two Gvec groups are really quite ### different but we are not given any direct data on ### which observations go with which Gvec !! NlogL = function(th) { a = plogis(th[1]) -sum(log( a*dpois(Xvec,exp(th[2])) + (1-a)*dpois(Xvec,exp(th[3])) )) } mlefit1 = nlm(NlogL, c(0,log(.1),log(.5)), gradtol=1.e-9, steptol=1.e-9, hessian=T) $minimum [1] 601.1122 $estimate [1] -0.9220058 0.6392831 1.2554214 $gradient [1] 0.000000e+00 -1.136868e-07 -9.055671e-08 $hessian [,1] [,2] [,3] [1,] 10.11999 -19.53766 -62.72526 [2,] -19.53766 52.66943 77.57667 [3,] -62.72526 77.57667 574.43759 $code [1] 1 $iterations [1] 36 ### estimates are: > th.nlm = c(a=plogis(mlefit1$est[1]), lam1=exp(mlefit1$est[2]), lam0= exp(mlefit1$est[3])) a lam1 lam0 0.2845494 1.8951219 3.5093169 Gst = function(X, th1) ### recall that the parameters in this function are the old th1 ### starting values, not the new th-values that we maximize over 1/(1 + (1/th1[1]-1)*exp(th1[2]-th1[3])*(th1[3]/th1[2])^X) ## The closed form EM solutions are given in the form of functions of Xvec and th1, ## as follows: th.max = function(Xvec, th1) { Gstsum = sum(Gst(Xvec,th1)) GstXsum = sum(Xvec*Gst(Xvec,th1)) n = length(Xvec) c(a = Gstsum/n, lam1 = GstXsum/Gstsum, lam0 = (sum(Xvec)-GstXsum)/(n-Gstsum)) } > thval = c(.5,.1,.5) for(i in 1:25) { for(j in 1:100) thval=th.max(Xvec,thval) cat("on iteration ",i*100,", (a,lam1,lam0) = ",round(thval[1],5),", ", round(thval[2],5),", ",round(thval[3],5),"\n", sep="") } on iteration 100, (a,lam1,lam0) = 0.17314, 1.55239, 3.36359 on iteration 200, (a,lam1,lam0) = 0.221, 1.71875, 3.42768 on iteration 300, (a,lam1,lam0) = 0.24699, 1.79584, 3.46137 on iteration 400, (a,lam1,lam0) = 0.26194, 1.8369, 3.48053 on iteration 500, (a,lam1,lam0) = 0.2708, 1.86025, 3.49184 on iteration 600, (a,lam1,lam0) = 0.27614, 1.87398, 3.49864 on iteration 700, (a,lam1,lam0) = 0.27939, 1.88223, 3.50277 on iteration 800, (a,lam1,lam0) = 0.28138, 1.88723, 3.5053 on iteration 900, (a,lam1,lam0) = 0.28261, 1.89029, 3.50685 on iteration 1000, (a,lam1,lam0) = 0.28336, 1.89216, 3.50781 on iteration 1100, (a,lam1,lam0) = 0.28382, 1.89332, 3.5084 on iteration 1200, (a,lam1,lam0) = 0.28411, 1.89403, 3.50876 on iteration 1300, (a,lam1,lam0) = 0.28429, 1.89447, 3.50899 on iteration 1400, (a,lam1,lam0) = 0.28439, 1.89474, 3.50912 on iteration 1500, (a,lam1,lam0) = 0.28446, 1.8949, 3.50921 on iteration 1600, (a,lam1,lam0) = 0.2845, 1.89501, 3.50926 on iteration 1700, (a,lam1,lam0) = 0.28453, 1.89507, 3.50929 on iteration 1800, (a,lam1,lam0) = 0.28455, 1.89511, 3.50931 on iteration 1900, (a,lam1,lam0) = 0.28455, 1.89513, 3.50933 on iteration 2000, (a,lam1,lam0) = 0.28456, 1.89515, 3.50933 on iteration 2100, (a,lam1,lam0) = 0.28456, 1.89516, 3.50934 on iteration 2200, (a,lam1,lam0) = 0.28457, 1.89516, 3.50934 on iteration 2300, (a,lam1,lam0) = 0.28457, 1.89517, 3.50934 on iteration 2400, (a,lam1,lam0) = 0.28457, 1.89517, 3.50934 on iteration 2500, (a,lam1,lam0) = 0.28457, 1.89517, 3.50935 ### In case of finding estimates that are apparently not very close to the ### true parameter values from which you simulated [which does not seem ### to be a problem in the estimates I found], we can make 95% confidence ### intervals around the estimated values and see whether the true values ### fall within those intervals. Here the confidence interval widths ### for the 3 parameters can be found as follows: NlogL0 = function(x,xv) -sum(log(x[1]*dpois(xv,x[2]) + (1-x[1])*dpois(xv,x[3]) )) mlefit2 = nlm(NlogL0, th.nlm, gradtol=1.e-9, steptol=1.e-9, hessian=T, xv=Xvec) > round(sqrt(diag(solve(mlefit2$hess))),4) [1] 0.3373 0.8741 0.4589 ### so this leaves a LOT of room for error! #### Let's compare this with what we would get from the same kind of data with n=30000 > nG1 = rbinom(1,30000,0.3) ### = 9034 mlefit2B = nlm(NlogL0, c(.3,2,3.5), gradtol=1.e-9, steptol=1.e-9, hessian=T, xv=c(rpois(nG1,2), rpois(3e4-nG1,3.5))) $estimate [1] 0.3041545 1.9822299 3.5131768 > round(sqrt(diag(solve(mlefit2B$hess))),5) [1] 0.04685 0.10988 0.05898 ### roughly 1/10 as large as before, because ### sample size n is larger by factor 100 NOTE: using the DELTA METHOD, the standard errors in mlefit2 can also be derived (approximately) from the hessian, in mlefit1, as follows: > round(sqrt(diag(solve(mlefit2$hess))),4) [1] 0.3373 0.8741 0.4589 > tmp = sqrt(diag(solve(mlefit1$hess))) round(c(dlogis(mlefit1$est[1])*tmp[1], exp(mlefit$est[2:3])*tmp[2:3]),4) [1] 0.3356 0.8697 0.4570 ##=========================================================== (2) Repeat the same problem where the data now consist of n=60 clusters (X_{i,1}, ..., X_{i,5}) of 5 conditionally iid Poisson(lambda_{G_i}) random variables, where again the parameters are a = P(G_1=1), and lambda_1 < lambda_2. set.seed(7757) Gv2 = rbinom(60,1,.3) th0 = c(.3,2,3.5) Xmat = array( rpois(300, th0[3-rep(Gv2,each=5)]), c(5,60) ) > rbind(Xibar = apply(Xmat,2,mean), true=th0[3-Gv2]) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] Xibar 4.4 4.2 2.4 5.2 3.2 2.8 3.6 2 3.8 5.4 2.4 2.4 3 true 3.5 3.5 3.5 3.5 3.5 3.5 3.5 2 3.5 3.5 3.5 3.5 2 [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] Xibar 1.2 3.6 1.8 1.4 4.0 3.6 3.0 3.2 4.0 5.0 3.4 2.2 true 2.0 3.5 2.0 2.0 3.5 3.5 3.5 3.5 3.5 3.5 3.5 2.0 [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] Xibar 3.2 2.8 5.0 3.4 1.8 2.0 3.6 1.8 1.4 4.2 2.0 3.8 true 3.5 3.5 3.5 3.5 2.0 3.5 3.5 3.5 2.0 3.5 3.5 3.5 [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] Xibar 4.2 2.2 4.6 2.4 2.2 3.8 3.4 4.6 2.2 3.0 2.2 3.0 true 3.5 2.0 3.5 3.5 2.0 3.5 3.5 3.5 2.0 3.5 2.0 3.5 [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] Xibar 3.6 1.8 4.6 2.6 3.4 3.2 4.8 2.2 3 2 3.8 true 3.5 2.0 3.5 3.5 3.5 3.5 3.5 2.0 2 2 3.5 NlogL3 = function(x,xm) { djoint1 = apply(dpois(xm,x[2]),2,prod) djoint0 = apply(dpois(xm,x[3]),2,prod) -sum(log(x[1]*djoint1 + (1-x[1])*djoint0 )) } mlefit3 = nlm(NlogL3, th0, gradtol=1.e-9, steptol=1.e-9, hessian=T, xm=Xmat) $minimum [1] 597.6572 $estimate [1] 0.4173144 2.3018464 3.7574403 $gradient [1] 9.379164e-08 4.938941e-10 -5.900009e-08 $hessian [,1] [,2] [,3] [1,] 129.2756 -35.974304 -33.282395 [2,] -35.9743 30.137021 -2.088265 [3,] -33.2824 -2.088265 32.113320 $code [1] 1 $iterations [1] 14 ### Now compare this to what we get with EM. > Xsum = apply(Xmat,2,sum) Xtot = sum(Xsum) ### also = sum(c(Xmat)) ## note that the xsum in the next function is apply(xmat,2,sum) Gst2 = function(xsum, th1) ### recall that the parameters in this function are the old th1 ### starting values, not the new th-values that we maximize over 1/(1 + (1/th1[1]-1)*exp(5*(th1[2]-th1[3]))*(th1[3]/th1[2])^xsum) ## The closed form EM solutions are given in the form of functions of Xvec and th1, ## as follows: th.max2 = function(xsum, th1) { Gst2sum = sum(Gst2(xsum,th1)) Gst2Xsum = sum(xsum*Gst2(xsum,th1)) n = length(xsum) c(a = Gst2sum/n, lam1 = 0.2*Gst2Xsum/Gst2sum, lam0 = 0.2*(sum(Xsum)-Gst2Xsum)/(n-Gst2sum)) } thval = c(.5,.1,.5) for(i in 1:10) { for(j in 1:20) thval=th.max2(Xsum,thval) cat("on iteration ",i*20,", (a,lam1,lam0) = ",round(thval[1],5),", ", round(thval[2],5),", ",round(thval[3],5),"\n", sep="") } on iteration 20, (a,lam1,lam0) = 0.20158, 1.97709, 3.44614 on iteration 40, (a,lam1,lam0) = 0.399, 2.27255, 3.73254 on iteration 60, (a,lam1,lam0) = 0.41551, 2.29896, 3.755 on iteration 80, (a,lam1,lam0) = 0.41713, 2.30156, 3.75719 on iteration 100, (a,lam1,lam0) = 0.4173, 2.30182, 3.75742 on iteration 120, (a,lam1,lam0) = 0.41731, 2.30184, 3.75744 on iteration 140, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744 on iteration 160, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744 on iteration 180, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744 on iteration 200, (a,lam1,lam0) = 0.41731, 2.30185, 3.75744 ### Much quicker in the second EM version !!!