Homework Problem 9, STAT 705, Fall 2015.

Assigned 10/12/2015,  Due Monday 10/19/2015


(1) Generate 1000 iid random variable values  X_1, ..., X_1000 from the 
distribution  Beta(a,b)  with  a=2.2 and  b = 1.5.

(2) Write a function to maximize the likelihood from such a dataset of size n, 
in one of three ways. Make this a user-supplied option for your function. (You can 
make the first option the default, for simplicity in the 3rd part of the problem.)

The first option: with the value  a  unknown  and  b=b0 input and treated as known.
The second option: with the value  a=a0 input and treated as known, but b unknown.
The third option: with both (a,b) parameters treated as unknown.

##--------------- Solution to parts 1,2

> Xdata = rbeta(1000,2.2,1.5)
  neglogL = function(ab,xdat) -sum(log(dbeta(xdat,ab[1],ab[2])))

> MaxLik = function(xdat, opt=3, a=NULL, b=NULL) {
     ### get starting values using one or both of the first two sample moments
     ## opt=1 when b is known, 2 when a is known, 3 when both unknown
        mu = mean(xdat)
        aplusb = mu*(1-mu)/var(xdat)-1
        ab.init = c(mu,1-mu)*aplusb
        if (opt==3) {
          fit= optim(ab.init, neglogL, xdat=xdat, 
                    lower=c(0,0), hessian=T)
        } else {
             if(opt==2) { 
              b.init = a*(1/mu-1)
              fit = optim(b.init, function(b,.a,.dat) neglogL(c(.a,b),.dat),
                        .a=a, .dat=xdat, lower=0, hessian=T)
            } else {
              a.init = b*(1/(1-mu)-1)
              fit = optim(a.init, function(a,.b,.dat) neglogL(c(a,.b),.dat),
                        .b=b, .dat=xdat, lower=0, hessian=T)
              } 
          }
          list(est=fit$par, SE= if(opt<3) 1/sqrt(fit$hessian) else 
                     sqrt(diag(solve(fit$hess))), 
                     logL = -fit$val, code=fit$conv, est.momt = ab.init) }

> round(unlist(MaxLik(Xdata, opt=1, b=1.5)),5)
      est        SE      logL      code est.momt1 est.momt2 
  2.11703   0.05948 140.43206   0.00000   2.27808   1.65219 

> round(unlist(MaxLik(Xdata, opt=2, a=2.2)),5)
      est        SE      logL      code est.momt1 est.momt2 
  1.59274   0.04218 141.98602   0.00000   2.27808   1.65219 

>  round(unlist(MaxLik(Xdata, opt=3)),5)
     est1      est2       SE1       SE2      logL      code est.momt1 est.momt2
  2.25351   1.62141   0.09781   0.06755 142.13818   0.00000   2.27808   1.65219 

#--------------------

(3) Apply your function to the full dataset using option 1 only (inputting b=1.5), 
and make sure it gives reasonable results not only for the estimate ahat of  a  
but also for the estimate of the standard deviation of  ahat. (This means that 
you should calculate the "Fisher Information" for the unknown parameter  a, at a=2.2).

    Next apply your function successively to 500 subsets of size 80 chosen 
randomly from your full dataset, according to option 1 (only  a  unknown, with 
b  fixed at the true value 1.5). For each application of your function to a 
subset-sample of size 80, you should generate an estimated  a-value and and 
estimated standard error for your estimator. For these 500 runs of your function, 
contrast the theoretical estimates of standard deviation ahat.SE of your a_hat 
estimators (MLEs) coming from the likelihood maximization with the empirical 
variance (sample variance of your 500 a_hat's). They should be reasonably close.

#--------------------Solution to part (3)

First the Fisher Information for a  in the case of fixed b=1.5 is easily checked,
at a=2.2, to be 

> trigamma(2.2)-trigamma(2.2+1.5)
[1] 0.2628949

### Based on a sample of size 1000 as above, this leads to SE for a.hat of

> sqrt(1/(.2628949*1000))
[1] 0.06167497

which is quite close to the SE of .05948 found in the opt=1 run of MaxLik above.


Now to the simulation: 

AestMat = array(0, c(500,2), dimnames=list(NULL, c("a.est","aest.SE")))
condvec = numeric(500)

for(i in 1:500) {
      inds = sample(1:500, 80)
      tmp = MaxLik(Xdata[inds], opt=1, b=1.5)
      AestMat[i,] = c(tmp$est, tmp$SE)
      condvec[i] = tmp$code }
                                 ### lots of harmless warnings

> c(theor.SE = sqrt(mean(AestMat[,2]^2)), emp.SE = sd(AestMat[,1]))
 theor.SE    emp.SE 
0.2097478 0.1946882              #### Not bad agreement, even for n=80 !!