HW 11 Stat 705  Fall 2015

Assigned Saturday 10/24/15 DUE Wednesday 11/4/15

(1) Generate a dataset of Yvec and (scalar) Xvec of size 150 from 
the model
                X_i ~ Unif[-1,1]   iid
              eps_i ~ N(0,sige^2)  iid independent of X_i

and given {X_i,eps_i},

    Y_i ~ Poisson( rate=exp(beta*X_i + eps_i) )   independent across i

For your simulated dataset, use values  beta=1.5, sige=0.8 .

    Create a function to evaluate the loglikelihood for the unknown 
parameter  (beta,sige^2)  based on a general dataset  
XYmat = cbind(Yvec,Xvec), and apply it to your simulated dataset.

NOTE: since eps_i is not observed, this involves numerical integration.
Try to parallelize this if you can, using Gaussian quadratures or 
some other numerical integration method, or else evaluation of this 
likelihood will be slow.

##------------- Solution pt (1)

       Xvec = runif(150,-1,1)
       epsvec = runif(150,0,0.8)
       Yvec = rpois(150,exp(1.5*Xvec+epsvec))
       XYmat = cbind(Xvec,Yvec)

    logL = function(parv, XYmat, npts=50) {
             ghq = GHquad.wt(npts)
             sige=sqrt(parv[2])
             num = nrow(XYmat)
             loglam = outer(parv[1]*XYmat[,1],
                        (sqrt(2)*sige)*ghq$pts,"+")
             Evalmat = array( dpois(rep(XYmat[,2],npts), 
                        exp(loglam)), c(num, npts))
             -mean(log( Evalmat %*% ghq$wts ))  }
>  logL(c(1.5,0.8^2),XYmat)
[1] 1.092026
>  logL(c(1.4,0.7^2),XYmat)
[1] 1.086519


(2) Directly Maximize the loglikelihood function you created with 
respect to (beta,sige^2), and give the standard errors for these 
parameter estimates.

##------------ Solution pt (2)

> tmp = nlm(logL, c(2.3,0.005), hessian=T, XYmat=cbind(Xvec,Yvec))
Warning messages:
1: In sqrt(parv[2]) : NaNs produced
2: In nlm(logL, c(2.3, 0.005), hessian = T, XYmat = cbind(Xvec, Yvec)) :
  NA/Inf replaced by maximum positive value
3: In sqrt(parv[2]) : NaNs produced
4: In nlm(logL, c(2.3, 0.005), hessian = T, XYmat = cbind(Xvec, Yvec)) :
  NA/Inf replaced by maximum positive value
> tmp
$minimum
[1] 1.076822

$estimate
[1] 1.6479044 0.3732908

$gradient
[1]  3.072155e-08 -1.931788e-08

$hessian
           [,1]       [,2]
[1,] 0.30094145 0.09441433
[2,] 0.09441433 0.61925856

$code
[1] 1

$iterations
[1] 12

> tmp2 = nlminb( c(2.3,0.005), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec))
> tmp2
$par
[1] 1.6479047 0.3732916

$objective
[1] 1.076822

$convergence
[1] 0

$iterations
[1] 10

$evaluations
function gradient 
      14       26 

$message
[1] "relative convergence (4)"


(3) In both parts (2) and (4) you need starting values. The performance 
of your likelihood maximization in both parts will be much better if 
you use good preliminary method-of-moment estimates of beta and sige^2.
However, solving simultaneously for beta and sige^2 within such 
estimates, while possible, is challenging. An easier, and adequate, 
method, is to fixed a small initial value for sige^2 (say 0.01) and 
solve the univariate equation

           sum_{i=1}^n (Y_i - exp(X_i*beta + .005)) = 0    (*)

for beta. (Why is this a sensible thing to do ?) Then you have initial values 
for  (beta, sige^2)  that you can use with either the method of (2) or (4).

##------------------- Solution pt (3)

The estimating equation is sensible if we view 0.1 as either the reasonable 
or the lowest likely value of the sige parameter. The point is that since,
as shown in class, E(Y|X) - exp(beta*X + sige^2/2) = 0, we can choose beta 
so that an estimate of the left-hand side is 0. i.e so that 
(1/n) sum_{i=1}^n (Y_i - exp(beta*X_i+ sige0^2/2)) = 0
where we use the "known" value of sige0^2, one choice for which might be (0.1)^2.

However, I think some of you found that this equation need not have 
a unique solution, which makes the solution via uniroot a problem. 
A slight variant of this approach which works better can be given as follows.
Think of  Y_i = exp(X_i*beta + sige0^2/2) + a_i with independent errors a_i as 
a nonlinear-regression problem. With that in mind, you would want to minimize

    sum_{i=1}^n (Y_i - exp(X_i*beta +sige0^2/2))^2                 (SSQ)

which upon differentiation in beta leads to the equation

      sum_{i=1}^n X_i (Y_i - exp(X_i*beta +sige0^2/2)) = 0          (**)

Now the solution to (**) is unique, as you can see by checking that the 
derivative with respect to beta of its left-hand side is strictly negative.

If you did find the starting value by solving (*), you could first that there 
are at most two solutions because the left-hand side of (*) is convex in beta; 
and then choose the solution that makes the sum of squares in equation (SSQ)
smaller.

Here are the R steps implementing these ideas:

> uniroot(function(beta,xymat) sum(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), 
      c(-10,10), xymat=XYmat)
### gives error message

> uniroot(function(beta,xymat) mean(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), 
      c(0,10), xymat=XYmat)     

$root
[1] 2.38461

$f.root
[1] -2.659122e-05     ### says it did not solve exactly =0, but close enough

> uniroot(function(beta,xymat) mean(XYmat[,2]-exp(beta*XYmat[,1]+0.005)), 
      c(-10,0), xymat=XYmat)     
$root
[1] -1.96631

$f.root
[1] -3.326937e-07

## OK so now we got two roots: what to do with them ??

> SSQ = function(b, xymat, sig) 
        sum((xymat[,2] - exp(b*xymat+0.5*sig^2))^2)
c(SSQ(2.3846,XYmat,.1), SSQ(-1.9663,XYmat,.1))
> c(SSQ(2.3846,XYmat,.1), SSQ(-1.9663,XYmat,.1))
[1] 1.182774e+31 3.747250e+03     
### so only the negative root makes sense here, which we confirm by
> c(mean(XYmat[,2]), mean(exp(-XYmat[,1])), mean(exp(-2*XYmat[,1])))
[1] 2.026667  1.267424  2.055506

### but the value 0.1 which is not close to the true sige is leading to 
### a value -1.97 for beta which is quite far from the true value 1.5.

### This means that in what follows, we may actually do worse to use 
###  starting value (-1.966,0.1^2) than (0, (0.1)^2).

> nlminb( c(-1.97,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec))
      ### converged to same place on same data in 13 iterations

> nlminb( c(0,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec))
      ### converged to same place on same data in 12 iterations

### The "better" starting value solving (**) gives

> uniroot(function(beta,xymat) mean(XYmat[,1]*(XYmat[,2]-exp(beta*XYmat[,1]+0.005))), 
      c(-10,10), xymat=XYmat)        ### root 1.91,  after 11 iterations


(4) Write a function to do the maximization you did in (2) a second way, 
using the EM algorithm, where the "augmented" data consists of  Xvec, 
Yvec AND the vector epsvec of epsilon's used to generate the data.

Make sure that you get the same convergent answer as in (2), and that 
your observed-data loglikelihood increases in every iteration. (This 
is just a check for purposes of debugging your code, and does not have
to be used in every iteration.)

##------------------- Solution pt (4)

The log augmented likelihood (apart from an additive constant not 
depending on parameters) is 

(-n/2)*log(sig^2) - sum_{i=1}^n eps_i^2/(2*sig^2) + sum_{i=1}^n Y_i*X_i*beta - 
      sum_{i=1}^n exp(X_i*beta)*exp(eps_i)

That explains why we need only to calculate the condition expectations 
of eps_i^2 and of exp(eps_i), which we do (using input values b.s and 
sige.s^2 of the parameters, x=X_i, and y=Y_i, with t for the dummy 
variable for Z/eps) through the three integrals

 int (1, (t*sige)^2, exp(t*sige))^{tr} exp(-exp(b*x+t*sige)) dnorm(t) dt 

(where the actual conditional expectations are found as ratios of the integrals 
of the 2nd and 3rd vector terms over the first).

Then denoting E#(.) the conditional expectation, we obtain the maximized 
values over beta and sig from the data as:

   beta solving  sum_{i=1}^n X_i(Y_i - E#(eps_i)*exp(X_i*beta)) = 0

   sige^2  = (1/n) * sum_{i=1}^n E#(eps_i^2)

Now we put all this together in a function, as follows:
   
   GHq = GHquad.wt(50)   ### or other npts

NB. range of Y values is:

> table(XYmat[,2])

 0  1  2  3  4  5  6  7  8  9 10 11 15 
43 46 18 14 10  5  5  2  2  1  2  1  1 

> EMstep = function(parv, xymat, ghq) {
     ## parv is the (beta,sige^2) starting value, xymat the data matrix
     sige=sqrt(parv[2])
     exb = exp(parv[1]*xymat[,1])
     tsigr2 = ghq$pts*(sqrt(2)*sige)
     fcnval = cbind(1, tsigr2^2, exp(tsigr2))
     lgamy = lgamma(1+xymat[,2])
     evalmat = exp(-lgamy-outer(exb, fcnval[,3],"*")+
             outer(xymat[,2],tsigr2,"*"))
     Intgs = evalmat %*% fcnval
     CondExps = cbind(Intgs[,2]/Intgs[,1], Intgs[,3]/Intgs[,1])
     sumxy=sum(xymat[,1]*xymat[,2])
     bsolv = uniroot(function(b,s,xv,cexp2) s - sum(cexp2*xv*exp(b*xv)),
               c(-10,10), s=sumxy,xv=xymat[,1],cexp2=CondExps[,2])$root
     c(bsolv, mean(CondExps[,1])) }

> EMstep(c(0,.01), XYmat, GHq)
[1] 0.9209543 0.5652564           ## decreases logL for 1 iteration, but not later
> tmp = EMstep(tmp[[1]],XYmat,GHq)   ### diverges rapidly after a few iterations!

#### But after a lot of checking, the problem is not with the coding but the 
terrible behavior of the numerical quadratures for these exponential functions.

## The only way I could implement EM directly was by filling the array of integrals 
using many repeated applications ogf "integrate", as follows.


EMstep3 = function(parv, xymat) {
     ## parv is the (beta,sige^2) starting value, xymat the data matrix
     sige=sqrt(parv[2])
     nx = nrow(xymat)
     exb = exp(parv[1]*xymat[,1])
     Intgs = array(0, c(nx,3))
     for(i in 1:nx) Intgs[i,] = 
     c(integrate(function(x,Xi,Yi,b, sig) 
          dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-Inf,Inf, 
          Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val,
       integrate(function(x,Xi,Yi,b, sig) x^2*
          dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-10,10, 
          Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val,
       integrate(function(x,Xi,Yi,b, sig) exp(x)*
          dpois(Yi,exp(b*Xi+x))*dnorm(x,0,sig),-10,10, 
          Xi=XYmat[i,1], Yi=XYmat[i,2], b=parv[1], sig=sige)$val)
     CondExps = cbind(Intgs[,2]/Intgs[,1], Intgs[,3]/Intgs[,1])
     sumxy=sum(xymat[,1]*xymat[,2])
     bsolv = uniroot(function(b,s,xv,cexp2) s - sum(cexp2*xv*exp(b*xv)),
               c(-10,10),s=sumxy,xv=xymat[,1],cexp2=CondExps[,2])$root
     c(bsolv, mean(CondExps[,1])) }

> tmp = EMstep3(c(0,.01), XYmat)
> tmp
[1] 1.86198846 0.01061962
> tmp = EMstep3(tmp,XYmat)
> tmp
[1] 1.90386557 0.01066433
> for (i in 1:100) tmp = EMstep3(tmp,XYmat)
> tmp
[1] 1.6483657 0.3723072
> for (i in 1:100) tmp = EMstep3(tmp,XYmat)
[1] 1.6478436 0.3733007   ### essentially the same as the MLE

### But this took a while:

> lgLkArr = numeric(100)
  logL(c(1.91,.01), XYmat)   ### 1.17607
  tmp = EMstep3(c(1.91,.01), XYmat)
  logL(tmp, XYmat)           ### 1.17591
> unix.time({ 
      tmp = EMstep3(c(1.91,.01), XYmat)
      for (i in 1:100) tmp = EMstep3(tmp,XYmat)
      lgLkArr[i] = logL(tmp, XYmat)
      tmp})
   user  system elapsed 
   9.80    0.00    9.82 

tmp
[1] 1.6487130 0.3716456

### Convergent respectively to 4th and 3rd decimal place

GHq = GHquad.wt(200)

logL(c(0,.01), XYmat)  ### 2.0819

> tmp = EMstep(c(0,.01), XYmat, GHq)
  tmp
[1] 0.1687843 1.6422559   ## logL = 1.3404

> tmp = EMstep(tmp, XYmat, GHq)
  tmp


(5) Do the same steps for a larger (n=1200) simulated dataset, and 
provide  a timing run for your direct MLE method and for your EM Method.
Which seems to be faster based on the same choice of starting values 
as in (3) ?

##----------- Solution part (5).

There is actually no comparison in this problem between ML and EM.

Consider the timing for the ML of a much larger dataset, as follows.

       Xvec2 = runif(1200,-1,1)
       epsvec2 = runif(1200,0,0.8)
       Yvec2 = rpois(1200,exp(1.5*Xvec+epsvec))
       XYmat2 = cbind(Xvec2,Yvec2)

   unix.time({ 
       ini.beta = uniroot(function(beta,xymat) 
          mean(XYmat[,1]*(XYmat[,2]-exp(beta*XYmat[,1]+0.005))), 
          c(-10,10), xymat=XYmat2) $root
       tmp= nlminb( c(ini.beta,0.01), logL, lower=c(-Inf,0), XYmat=cbind(Xvec,Yvec))
       })
   user  system elapsed 
   0.17    0.00    0.17