HW 11 Stat 705 Fall 2015 Assigned Saturday 10/24/15 DUE Wednesday 11/4/15 (1) Generate a dataset of Yvec and (scalar) Xvec of size 150 from the model X_i ~ Unif[-1,1] iid eps_i ~ N(0,sige^2) iid independent of X_i and given {X_i,eps_i}, Y_i ~ Poisson( rate=exp(beta*X_i + eps_i) ) independent across i For your simulated dataset, use values beta=1.5, sige=0.8 . Create a function to evaluate the loglikelihood for the unknown parameter (beta,sige^2) based on a general dataset XYmat = cbind(Yvec,Xvec), and apply it to your simulated dataset. NOTE: since eps_i is not observed, this involves numerical integration. Try to parallelize this if you can, using Gaussian quadratures or some other numerical integration method, or else evaluation of this likelihood will be slow. ##------------- Solution pt (1) Xvec = runif(150,-1,1) epsvec = runif(150,0,0.8) Yvec = rpois(150,exp(1.5*Xvec+epsvec)) XYmat = cbind(Xvec,Yvec) logL = function(parv, XYmat, npts=50) { ghq = GHquad.wt(npts) sige=sqrt(parv[2]) num = nrow(XYmat) loglam = outer(parv[1]*XYmat[,1], (sqrt(2)*sige)*ghq$pts,"+") Evalmat = array( dpois(rep(XYmat[,2],npts), exp(loglam)), c(num, npts)) -mean(log( Evalmat %*% ghq$wts )) } > logL(c(1.5,0.8^2),XYmat) [1] 1.092026 > logL(c(1.4,0.7^2),XYmat) [1] 1.086519 (2) Directly Maximize the loglikelihood function you created with respect to (beta,sige^2), and give the standard errors for these parameter estimates. ##------------ Solution pt (2) > tmp = nlm(logL, c(2.3,0.005), hessian=T, XYmat=cbind(Xvec,Yvec)) Warning messages: 1: In sqrt(parv[2]) : NaNs produced 2: In nlm(logL, c(2.3, 0.005), hessian = T, XYmat = cbind(Xvec, Yvec)) : NA/Inf replaced by maximum positive value 3: In sqrt(parv[2]) : NaNs produced 4: In nlm(logL, c(2.3, 0.005), hessian = T, XYmat = cbind(Xvec, Yvec)) : NA/Inf replaced by maximum positive value > tmp $minimum [1] 1.076822 $estimate [1] 1.6479044 0.3732908 $gradient [1] 3.072155e-08 -1.931788e-08 $hessian [,1] [,2] [1,] 0.30094145 0.09441433 [2,] 0.09441433 0.61925856 $code [1] 1 $iterations [1] 12 > tmp2 = nlminb( c(2.3,0.005), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec)) > tmp2 $par [1] 1.6479047 0.3732916 $objective [1] 1.076822 $convergence [1] 0 $iterations [1] 10 $evaluations function gradient 14 26 $message [1] "relative convergence (4)" (3) In both parts (2) and (4) you need starting values. The performance of your likelihood maximization in both parts will be much better if you use good preliminary method-of-moment estimates of beta and sige^2. However, solving simultaneously for beta and sige^2 within such estimates, while possible, is challenging. An easier, and adequate, method, is to fixed a small initial value for sige^2 (say 0.01) and solve the univariate equation sum_{i=1}^n (Y_i - exp(X_i*beta + .005)) = 0 (*) for beta. (Why is this a sensible thing to do ?) Then you have initial values for (beta, sige^2) that you can use with either the method of (2) or (4). ##------------------- Solution pt (3) The estimating equation is sensible if we view 0.1 as either the reasonable or the lowest likely value of the sige parameter. The point is that since, as shown in class, E(Y|X) - exp(beta*X + sige^2/2) = 0, we can choose beta so that an estimate of the left-hand side is 0. i.e so that (1/n) sum_{i=1}^n (Y_i - exp(beta*X_i+ sige0^2/2)) = 0 where we use the "known" value of sige0^2, one choice for which might be (0.1)^2. However, I think some of you found that this equation need not have a unique solution, which makes the solution via uniroot a problem. A slight variant of this approach which works better can be given as follows. Think of Y_i = exp(X_i*beta + sige0^2/2) + a_i with independent errors a_i as a nonlinear-regression problem. With that in mind, you would want to minimize sum_{i=1}^n (Y_i - exp(X_i*beta +sige0^2/2))^2 (SSQ) which upon differentiation in beta leads to the equation sum_{i=1}^n X_i (Y_i - exp(X_i*beta +sige0^2/2)) = 0 (**) Now the solution to (**) is unique, as you can see by checking that the derivative with respect to beta of its left-hand side is strictly negative. If you did find the starting value by solving (*), you could first that there are at most two solutions because the left-hand side of (*) is convex in beta; and then choose the solution that makes the sum of squares in equation (SSQ) smaller. Here are the R steps implementing these ideas: > uniroot(function(beta,xymat) sum(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), c(-10,10), xymat=XYmat) ### gives error message > uniroot(function(beta,xymat) mean(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), c(0,10), xymat=XYmat) $root [1] 2.38461 $f.root [1] -2.659122e-05 ### says it did not solve exactly =0, but close enough > uniroot(function(beta,xymat) mean(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), c(-10,0), xymat=XYmat) $root [1] -1.96631 $f.root [1] -3.326937e-07 ## OK so now we got two roots: what to do with them ?? > SSQ = function(b, xymat, sig) sum((xymat[,2] - exp(b*xymat+0.5*sig^2))^2) c(SSQ(2.3846,XYmat,.1), SSQ(-1.9663,XYmat,.1)) > c(SSQ(2.3846,XYmat,.1), SSQ(-1.9663,XYmat,.1)) [1] 1.182774e+31 3.747250e+03 ### so only the negative root makes sense here, which we confirm by > c(mean(XYmat[,2]), mean(exp(-XYmat[,1])), mean(exp(-2*XYmat[,1]))) [1] 2.026667 1.267424 2.055506 ### but the value 0.1 which is not close to the true sige is leading to ### a value -1.97 for beta which is quite far from the true value 1.5. ### This means that in what follows, we may actually do worse to use ### starting value (-1.966,0.1^2) than (0, (0.1)^2). > nlminb( c(-1.97,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec)) ### converged to same place on same data in 13 iterations > nlminb( c(0,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec)) ### converged to same place on same data in 12 iterations ### The "better" starting value solving (**) gives > uniroot(function(beta,xymat) mean(XYmat[,1]*(XYmat[,2]-exp(beta*XYmat[,1]+0.005))), c(-10,10), xymat=XYmat) ### root 1.91, after 11 iterations (4) Write a function to do the maximization you did in (2) a second way, using the EM algorithm, where the "augmented" data consists of Xvec, Yvec AND the vector epsvec of epsilon's used to generate the data. Make sure that you get the same convergent answer as in (2), and that your observed-data loglikelihood increases in every iteration. (This is just a check for purposes of debugging your code, and does not have to be used in every iteration.) ##------------------- Solution pt (4) The log augmented likelihood (apart from an additive constant not depending on parameters) is (-n/2)*log(sig^2) - sum_{i=1}^n eps_i^2/(2*sig^2) + sum_{i=1}^n Y_i*X_i*beta - sum_{i=1}^n exp(X_i*beta)*exp(eps_i) That explains why we need only to calculate the condition expectations of eps_i^2 and of exp(eps_i), which we do (using input values b.s and sige.s^2 of the parameters, x=X_i, and y=Y_i, with t for the dummy variable for Z/eps) through the three integrals int (1, (t*sige)^2, exp(t*sige))^{tr} exp(-exp(b*x+t*sige)) dnorm(t) dt (where the actual conditional expectations are found as ratios of the integrals of the 2nd and 3rd vector terms over the first). Then denoting E#(.) the conditional expectation, we obtain the maximized values over beta and sig from the data as: beta solving sum_{i=1}^n X_i(Y_i - E#(eps_i)*exp(X_i*beta)) = 0 sige^2 = (1/n) * sum_{i=1}^n E#(eps_i^2) Now we put all this together in a function, as follows: GHq = GHquad.wt(50) ### or other npts NB. range of Y values is: > table(XYmat[,2]) 0 1 2 3 4 5 6 7 8 9 10 11 15 43 46 18 14 10 5 5 2 2 1 2 1 1 > EMstep = function(parv, xymat, ghq) { ## parv is the (beta,sige^2) starting value, xymat the data matrix sige=sqrt(parv[2]) exb = exp(parv[1]*xymat[,1]) tsigr2 = ghq$pts*(sqrt(2)*sige) fcnval = cbind(1, tsigr2^2, exp(tsigr2)) lgamy = lgamma(1+xymat[,2]) evalmat = exp(-lgamy-outer(exb, fcnval[,3],"*")+ outer(xymat[,2],tsigr2,"*")) Intgs = evalmat %*% fcnval CondExps = cbind(Intgs[,2]/Intgs[,1], Intgs[,3]/Intgs[,1]) sumxy=sum(xymat[,1]*xymat[,2]) bsolv = uniroot(function(b,s,xv,cexp2) s - sum(cexp2*xv*exp(b*xv)), c(-10,10), s=sumxy,xv=xymat[,1],cexp2=CondExps[,2])$root c(bsolv, mean(CondExps[,1])) } > EMstep(c(0,.01), XYmat, GHq) [1] 0.9209543 0.5652564 ## decreases logL for 1 iteration, but not later > tmp = EMstep(tmp[[1]],XYmat,GHq) ### diverges rapidly after a few iterations! #### But after a lot of checking, the problem is not with the coding but the terrible behavior of the numerical quadratures for these exponential functions. ## The only way I could implement EM directly was by filling the array of integrals using many repeated applications ogf "integrate", as follows. EMstep3 = function(parv, xymat) { ## parv is the (beta,sige^2) starting value, xymat the data matrix sige=sqrt(parv[2]) nx = nrow(xymat) exb = exp(parv[1]*xymat[,1]) Intgs = array(0, c(nx,3)) for(i in 1:nx) Intgs[i,] = c(integrate(function(x,Xi,Yi,b, sig) dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-Inf,Inf, Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val, integrate(function(x,Xi,Yi,b, sig) x^2* dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-10,10, Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val, integrate(function(x,Xi,Yi,b, sig) exp(x)* dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-10,10, Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val) CondExps = cbind(Intgs[,2]/Intgs[,1], Intgs[,3]/Intgs[,1]) sumxy=sum(xymat[,1]*xymat[,2]) bsolv = uniroot(function(b,s,xv,cexp2) s - sum(cexp2*xv*exp(b*xv)), c(-10,10),s=sumxy,xv=xymat[,1],cexp2=CondExps[,2])$root c(bsolv, mean(CondExps[,1])) } > tmp = EMstep3(c(0,.01), XYmat) > tmp [1] 1.86198846 0.01061962 > tmp = EMstep3(tmp,XYmat) > tmp [1] 1.90386557 0.01066433 > for (i in 1:100) tmp = EMstep3(tmp,XYmat) > tmp [1] 1.6483657 0.3723072 > for (i in 1:100) tmp = EMstep3(tmp,XYmat) [1] 1.6478436 0.3733007 ### essentially the same as the MLE ### But this took a while: > lgLkArr = numeric(100) logL(c(1.91,.01), XYmat) ### 1.17607 tmp = EMstep3(c(1.91,.01), XYmat) logL(tmp, XYmat) ### 1.17591 > unix.time({ tmp = EMstep3(c(1.91,.01), XYmat) for (i in 1:100) tmp = EMstep3(tmp,XYmat) lgLkArr[i] = logL(tmp, XYmat) tmp}) user system elapsed 9.80 0.00 9.82 tmp [1] 1.6487130 0.3716456 ### Convergent respectively to 4th and 3rd decimal place GHq = GHquad.wt(200) logL(c(0,.01), XYmat) ### 2.0819 > tmp = EMstep(c(0,.01), XYmat, GHq) tmp [1] 0.1687843 1.6422559 ## logL = 1.3404 > tmp = EMstep(tmp, XYmat, GHq) tmp (5) Do the same steps for a larger (n=1200) simulated dataset, and provide a timing run for your direct MLE method and for your EM Method. Which seems to be faster based on the same choice of starting values as in (3) ? ##----------- Solution part (5). There is actually no comparison in this problem between ML and EM. Consider the timing for the ML of a much larger dataset, as follows. Xvec2 = runif(1200,-1,1) epsvec2 = runif(1200,0,0.8) Yvec2 = rpois(1200,exp(1.5*Xvec+epsvec)) XYmat2 = cbind(Xvec2,Yvec2) unix.time({ ini.beta = uniroot(function(beta,xymat) mean(XYmat[,1]*(XYmat[,2]-exp(beta*XYmat[,1]+0.005))), c(-10,10), xymat=XYmat2) $root tmp= nlminb( c(ini.beta,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec)) }) user system elapsed 0.17 0.00 0.17