Homework Problem 9, STAT 705, Fall 2015. Assigned 10/12/2015, Due Monday 10/19/2015 (1) Generate 1000 iid random variable values X_1, ..., X_1000 from the distribution Beta(a,b) with a=2.2 and b = 1.5. (2) Write a function to maximize the likelihood from such a dataset of size n, in one of three ways. Make this a user-supplied option for your function. (You can make the first option the default, for simplicity in the 3rd part of the problem.) The first option: with the value a unknown and b=b0 input and treated as known. The second option: with the value a=a0 input and treated as known, but b unknown. The third option: with both (a,b) parameters treated as unknown. ##--------------- Solution to parts 1,2 > Xdata = rbeta(1000,2.2,1.5) neglogL = function(ab,xdat) -sum(log(dbeta(xdat,ab[1],ab[2]))) > MaxLik = function(xdat, opt=3, a=NULL, b=NULL) { ### get starting values using one or both of the first two sample moments ## opt=1 when b is known, 2 when a is known, 3 when both unknown mu = mean(xdat) aplusb = mu*(1-mu)/var(xdat)-1 ab.init = c(mu,1-mu)*aplusb if (opt==3) { fit= optim(ab.init, neglogL, xdat=xdat, lower=c(0,0), hessian=T) } else { if(opt==2) { b.init = a*(1/mu-1) fit = optim(b.init, function(b,.a,.dat) neglogL(c(.a,b),.dat), .a=a, .dat=xdat, lower=0, hessian=T) } else { a.init = b*(1/(1-mu)-1) fit = optim(a.init, function(a,.b,.dat) neglogL(c(a,.b),.dat), .b=b, .dat=xdat, lower=0, hessian=T) } } list(est=fit$par, SE= if(opt<3) 1/sqrt(fit$hessian) else sqrt(diag(solve(fit$hess))), logL = -fit$val, code=fit$conv, est.momt = ab.init) } > round(unlist(MaxLik(Xdata, opt=1, b=1.5)),5) est SE logL code est.momt1 est.momt2 2.11703 0.05948 140.43206 0.00000 2.27808 1.65219 > round(unlist(MaxLik(Xdata, opt=2, a=2.2)),5) est SE logL code est.momt1 est.momt2 1.59274 0.04218 141.98602 0.00000 2.27808 1.65219 > round(unlist(MaxLik(Xdata, opt=3)),5) est1 est2 SE1 SE2 logL code est.momt1 est.momt2 2.25351 1.62141 0.09781 0.06755 142.13818 0.00000 2.27808 1.65219 #-------------------- (3) Apply your function to the full dataset using option 1 only (inputting b=1.5), and make sure it gives reasonable results not only for the estimate ahat of a but also for the estimate of the standard deviation of ahat. (This means that you should calculate the "Fisher Information" for the unknown parameter a, at a=2.2). Next apply your function successively to 500 subsets of size 80 chosen randomly from your full dataset, according to option 1 (only a unknown, with b fixed at the true value 1.5). For each application of your function to a subset-sample of size 80, you should generate an estimated a-value and and estimated standard error for your estimator. For these 500 runs of your function, contrast the theoretical estimates of standard deviation ahat.SE of your a_hat estimators (MLEs) coming from the likelihood maximization with the empirical variance (sample variance of your 500 a_hat's). They should be reasonably close. #--------------------Solution to part (3) First the Fisher Information for a in the case of fixed b=1.5 is easily checked, at a=2.2, to be > trigamma(2.2)-trigamma(2.2+1.5) [1] 0.2628949 ### Based on a sample of size 1000 as above, this leads to SE for a.hat of > sqrt(1/(.2628949*1000)) [1] 0.06167497 which is quite close to the SE of .05948 found in the opt=1 run of MaxLik above. Now to the simulation: AestMat = array(0, c(500,2), dimnames=list(NULL, c("a.est","aest.SE"))) condvec = numeric(500) for(i in 1:500) { inds = sample(1:500, 80) tmp = MaxLik(Xdata[inds], opt=1, b=1.5) AestMat[i,] = c(tmp$est, tmp$SE) condvec[i] = tmp$code } ### lots of harmless warnings > c(theor.SE = sqrt(mean(AestMat[,2]^2)), emp.SE = sd(AestMat[,1])) theor.SE emp.SE 0.2097478 0.1946882 #### Not bad agreement, even for n=80 !!