HW 11 Stat 705  Fall 2015

Assigned Saturday 10/24/15 DUE Wednesday 11/4/15

(1) Generate a dataset of Yvec and (scalar) Xvec of size 150 from 
the model
                X_i ~ Unif[-1,1]   iid
              eps_i ~ N(0,sige^2)  iid independent of X_i

and given {X_i,eps_i},

    Y_i ~ Poisson( rate=exp(beta*X_i + eps_i) )   independent across i

For your simulated dataset, use values  beta=1.5, sige=0.8 .

    Create a function to evaluate the loglikelihood for the unknown 
parameter  (beta,sige^2)  based on a general dataset  
XYmat = cbind(Yvec,Xvec), and apply it to your simulated dataset.

NOTE: since eps_i is not observed, this involves numerical integration.
Try to parallelize this if you can, using Gaussian quadratures or 
some other numerical integration method, or else evaluation of this 
likelihood will be slow.


(2) Directly Maximize the loglikelihood function you created with 
respect to (beta,sige^2), and give the standard errors for these 
parameter estimates.

(3) In both parts (2) and (4) you need starting values. The performance 
of your likelihood maximization in both parts will be much better if 
you use good preliminary method-of-moment estimates of beta and sige^2.
However, solving simultaneously for beta and sige^2 within such 
estimates, while possible, is challenging. An easier, and adequate, 
method, is to fixed a small initial value for sige^2 (say 0.01) and 
solve the univariate equation

           sum_{i=1}^n (Y_i - exp(X_i*beta + .005)) = 0

for beta. (Why is this a sensible thing to do. Then you have initial 
values for  (beta, sige^2)  that you can use with either the method 
of (2) or (4).


(4) Write a function to do the maximization you did in (2) a second way, 
using the EM algorithm, where the "augmented" data consists of  Xvec, 
Yvec AND the vector epsvec of epsilon's used to generate the data.

Make sure that you get the same convergent answer as in (2), and that 
your observed-data loglikelihood increases in every iteration. (This 
is just a check for purposes of debugging your code, and does not have
to be used in every iteration.)

(5) Do the same steps for a larger (n=1200) simulated dataset, and 
provide  a timing run for your direct MLE method and for your EM Method.
Which seems to be faster based on the same choice of starting values 
as in (3) ?