HW 12 solution.
==============

(a) 
epsil = sample(1:3, 1000, rep=T, prob=c(.2,.5,.3))
mu = 0.7
parvec = c(4,5,7)*mu
Xvec = rpois(1000, parvec[epsil])

### To check that it came out right:
parvec
[1] 2.8 3.5 4.9
> table(epsil)
epsil
  1   2   3
199 493 308

for(j in 1:3) {
        tmp = Xvec[epsil==j]
        cat("eps=",j," population: mean=",mean(tmp),
              "  and   var=",var(tmp),"\n") }
eps= 1  population: mean= 2.512563   and   var= 2.432922
eps= 2  population: mean= 3.501014   and   var= 3.583840
eps= 3  population: mean= 4.866883   and   var= 5.562027

## NB 95% CI for eps=1 pop mean is 
2.513 + c(-1,1)*qnorm(.975)*sqrt(2.433/199)
[1] 2.296283 2.7297171] 
        #### So, surprisingly our true parameter 2.8
   ### lies (just) outside the CI !! Such things do happen.


(b) We findthe MLE for mu first using the general R
mamixizers. (Recall true mu=.7)

> llkMix = function(mu) sum(Xvec)*log(mu)-4000*mu + sum(log(
       .2*(4^Xvec) + (.5*exp(-mu))*(5^Xvec) + (.3*exp(-3*mu))*(7^Xvec)))

  unlist(nlm(function(x) -llkMix(x)/1000, 4))
      minimum      estimate      gradient          code    iterations
-1.185645e+00  6.897532e-01 -4.818368e-08  1.000000e+00  6.000000e+00

  unlist(optimize(function(x) -llkMix(x)/1000, c(.1,3)))
   minimum  objective
 0.6897458 -1.1856448

  unlist(optim(2,function(x) -llkMix(x)/1000, method="BFGS"))
         par        value counts.function counts.gradient   convergence
   0.6897273   -1.1856448      13.0000000       7.0000000     0.0000000

### Now comes EM algorithm, with main step coded as follows
### First comes conditional prob mass for eps given X

prbs = c(.2,.5,.3)
pv = function(x,mu1) {
       outp = (prbs*exp(-c(4,5,7)*mu1))*outer(c(4,5,7),x,"^")
    ### this is now a 3 by n matrix
       outp*outer(rep(1,3), 1/c(rep(1,3)%*%outp))
    ### output is 3xn  matrix with columns summing to 1
    }

### The EM step involves maximizing over mu: 
###          sum(Xvec)*log(mu)

EMmap = function(mu) 
            sum(Xvec)/c((c(4,5,7)%*%pv(Xvec,mu)%*%rep(1,length(Xvec))))


### Here is the trajectory of 20 EM iterates starting with mu1=.5
mu1=.5
for (i in 1:20) {
     mu1 = EMmap(mu1) 
     cat(round(mu1,7),"   ")
     if(i %% 6 ==0 | i==100) cat("\n") }    
0.6640979    0.6862729    0.6892816    0.6896897    0.689745    0.6897525
0.6897535    0.6897537    0.6897537    0.6897537    0.6897537    0.6897537
0.6897537    0.6897537    0.6897537    0.6897537    0.6897537    0.6897537
0.6897537    0.6897537  
### Converged already on 8th iterate to accuracy 1e-7
### Let's do it again from a much worse starting point:

mu1=4
for (i in 1:20) {
     mu1 = EMmap(mu1) 
     cat(round(mu1,7),"   ")
     if(i %% 6 ==0 | i==100) cat("\n") }
0.9076402    0.7188699    0.6936985    0.6902886    0.6898262    0.6897635
0.689755    0.6897539    0.6897537    0.6897537    0.6897537    0.6897537
0.6897537    0.6897537    0.6897537    0.6897537    0.6897537    0.6897537
0.6897537    0.6897537  
### This time, it took 9 iterates.

SO IN THIS EM EXAMPLE CONVERGENCE CAME IN ALMOST AS FEW ITERATES AS IN 
OPTIM OR NLM.