HW 10 SOLUTION 10/22/16 =========================== Consider the model Y_i = exp(alpha+beta*X_i) + gamma*X_i*eps_i i.i.d. , i=1,...,n where (X_i,Y_i) are observed data-pairs, and X_i ~ Unif[0,1], with eps_i ~ N(0,1) independent of X_i and with parameters (alpha,beta,gamma) unknown. Start with a sample dataset: Xvec = runif(150) Yvec = rnorm(150,exp(1+2*Xvec), 0.5*Xvec) ##--------------------------------------------------- THE MAIN MESSAGE IN THIS PROBLEM SOLUTION IS AS FOLLOWS: (A) AS MANY OF YOU FOUND, THERE ARE SOME DIFFICULTIES IN APPLYING nlm DIRECTLY IN MAXIMIZING THE LOG-LIKELIHOOD IN THIS PROBLEM (depending a bit on your parameter choices and the initial values of your calls to nlm) (B) BUT YOU CAN IN FACT ITERATE TO FAIRLY PRECISE CONVERGENCE IN FINDING THE MLE'S EITHER WITH nlm OR optim OR nlminb. AT THE POINTS OF CONVERGENCE, THE HESSIANS (of negative log-likelihood) ARE POSITIVE DEFINITE AND IF YOU CHECKED THE GRADIENTS IN nlm, THEY CAN BE MADE SMALL BY CHOICE OF gradtol AND steptol PARAMETERS. (C) THE APPROXIMATE NORMALITY I EXPECTED YOU TO BE ABLE TO SHOW AT n=150 DOES NOT IN FACT HOLD, ESPECIALLY FOR THE alpha-hat ESTIMATE AS SHOWN BY HISTOGRAMS AND , AS EVIDENCED ALSO BY THE DISCREPANCY BETWEEN THE EMPIRICAL SD's AND HESSIAN-BASED THEORETICAL STANDARD DEVIATONS FOR alpha-hat. EVEN WITH A LARGER SAMPLE-SIZE, OF n=500, or n=1000 or more, THIS PHENOMENON WILL PERSIST, FOR A GOOD THEORETICAL REASON THAT I WILL DISCUSS IN CLASS. (D) BOTH WITH HISTOGRAMS AND WITH OTHER MEASURES OF CLOSENESS BETWEEN THE NORMAL AND EMPIRICAL DISTRIBUTIONS FUNCTIONS, LIKE THE MAXIMUM DISTANCE AT SORTED MLE VALUES, YOU CAN SEE THAT THE APPROXIMATE NORMALITY FOR beta-hat AND gamma-hat DOES HOLD WITH A REASONABLE APPROXIMATION at n=150. (E) IF YOU REPEAT THE SAME STEPS WITH A MUCH BETTER-BEHAVED VARIANCE MODEL, SUCH AS A MODEL WITH sig*exp(gamma*X_i) replacing gamma*X_i AS THE CONDITIONAL VARIANCE OF Y_i GIVEN X_i, THEN YOU CAN FIND THAT THE APPROXIMATE NORMALITY OF PARAMETER MLE'S HOLDS TO A GOOD APPROXIMATION WITH n=150, AND TO A MUCH WEAKER LEVEL OF APPROXIMATION WITH n=50. Some of this last demonstration is given at the end of the solution below. ##--------------------------------------------------- SEVERAL MLE Methods The negative log-likelihood for theta = (alpha,beta,log(gamma)) is NlogL = function(theta,Y,X) -sum(dnorm(Y,exp(theta[1]+theta[2]*X),X*theta[3], log=T)) NlogL(c(1,2,.5),Yvec,Xvec) [1] -23.34669 MLE1 = optim(c(1,2,0.5), function(theta) NlogL(theta,Yvec,Xvec), lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T, method="L-BFGS-B") MLE1[1:4] $par [1] 0.9982743 2.0009628 0.5078612 $value [1] -25.57573 $counts function gradient 30 30 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" ### So this method gives nice convergence, and SE estimates > SE1 = sqrt(diag(solve(MLE1$hess))) > round(SE1,5) [1] 0.00086 0.00442 0.02932 > MLE2 = nlminb(c(1,2,0.5), function(theta) NlogL(theta,Yvec,Xvec), lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3)) > MLE2 $par [1] 0.9982748 2.0009625 0.5078611 $objective [1] -25.57573 $convergence [1] 0 $iterations [1] 18 $evaluations function gradient 30 68 $message [1] "relative convergence (4)" > MLE3 = nlm(function(theta) NlogL(c(theta[1:2],theta[3]), Yvec,Xvec), c(1,2,0.5), gradtol = 1.e-8, hessian=T, steptol=1.e-8) $minimum [1] -25.57573 $estimate [1] 0.9982748 2.0009625 0.5078613 $gradient [1] -6.642438e-06 1.304616e-05 1.960920e-07 $hessian [,1] [,2] [,3] [1,] 1518498.8894 99132.50042 -298.82084 [2,] 99132.5004 57574.37390 -22.66714 [3,] -298.8208 -22.66714 1161.99341 $code [1] 3 $iterations [1] 18 ### Repeated simulation to check approximate normality of MLEs: tmparr = array(0, c(150,2,1000)) tmparr[,1,] = runif(1.5e5) tmparr[,2,] = rnorm(1.5e5,exp(1+2*tmparr[,1,]),0.5*tmparr[,1,]) MLEarr = array(0,c(1000,3,2)) for(i in 1:1000) { tmpfit = optim(c(1,2,0.5), function(theta) NlogL(theta, tmparr[,2,i],tmparr[,1,i]), lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T, method="L-BFGS-B") MLEarr[i,,] = c(tmpfit$par, sqrt(diag(solve(tmpfit$hess)))) } ### took less than a minute on my laptop > summary(MLEarr[,,2]) V1 V2 V3 Min. :1.239e-06 Min. :0.003096 Min. :0.02146 1st Qu.:2.957e-04 1st Qu.:0.004084 1st Qu.:0.02757 Median :6.616e-04 Median :0.004298 Median :0.02866 Mean :7.679e-04 Mean :0.004385 Mean :0.02870 3rd Qu.:1.122e-03 3rd Qu.:0.004630 3rd Qu.:0.02986 Max. :3.249e-03 Max. :0.006192 Max. :0.03508 > apply(MLEarr[,,2],2,sd)/apply(MLEarr[,,2],2,mean) [1] 0.76814320 0.10134776 0.05802194 ### This shows that the relative variability among hessian-based sd's ### is small for beta and gamma, but not for alpha. > round( rbind(empirical = apply(MLEarr[,,1],2,sd), hessian.based = apply(MLEarr[,,2],2,mean)),5) [,1] [,2] [,3] empirical 0.00095 0.00445 0.02771 hessian.based 0.00077 0.00438 0.02870 ### This shows that the theoretical and empirical sd's of MLE's are ### fairly close for beta,gamma, not quite so close for alpha. > hist(MLEarr[,1,1],nclass=40, prob=T, main= "Histogram of alpha.hat MLEs in HW10 N=1000, n=150, alpha=1, beta=2, gamma=0.5", xlab="MLEs for alpha") curve(dnorm(x,mean(MLEarr[,1,1]),mean(MLEarr[,1,2])), add=T, col="red") ### Picture saved as HW10Pic1.pdf ### You can see from this picture that the behavior is actually quite NON-normal for the alpha.hat MLE even for n=150 !! > hist(MLEarr[,2,1], nclass=40, prob=T, main = "Histogram of beta.hat MLEs in HW10 N=1000, n=150, alpha=1, beta=2, gamma=0.5", xlab="MLEs for beta") curve(dnorm(x,mean(MLEarr[,2,1]),sd(MLEarr[,2,1])), add=T, col="red") ### Picture saved as HW10Pic2.pdf > hist(MLEarr[,3,1], nclass=50, prob=T, main = "Histogram of gamma.hat MLEs in HW10 N=1000, n=150, alpha=1, beta=2, gamma=0.5", xlab="MLEs for gamma") curve(dnorm(x,mean(MLEarr[,3,1]),sd(MLEarr[,3,1])), add=T, col="red") ### Picture saved as HW10Pic3.pdf ### But the beta.hat and gamma.hat histograms do look quite normal for n=150. ### As a way of measuring the distance between the emipirical distribution of the MLE's and the normal distribution with same mean and variance: > alph1 = sort(MLEarr[,1,1]) max( abs( (1:1000)/1000 - pnorm(alph1,mean(alph1),sd(alph1)) )) [1] 0.1205623 ## This is quite a large discrepancy between the empirical and approximating normal cdf for alpha.MLEs. Compare this situation with that for beta.MLEs and gamma.MLEs. > beta1 = sort(MLEarr[,2,1]) max( abs( (1:1000)/1000 - pnorm(beta1,mean(beta1),sd(beta1)) )) [1] 0.02885048 > gamma1 = sort(MLEarr[,3,1]) max( abs( (1:1000)/1000 - pnorm(gamma1,mean(gamma1),sd(gamma1)) )) [1] 0.03008211 ###----------------------------------------------- Repeat for n=500 Xvec2 = runif(500) Yvec2 = rnorm(500,exp(1+2*Xvec), 0.5*Xvec) tmparr2 = array(0, c(500,2,1000)) tmparr2[,1,] = runif(5e5) tmparr2[,2,] = rnorm(5e5,exp(1+2*tmparr2[,1,]),0.5*tmparr2[,1,]) MLEarr2 = array(0,c(1000,3,2)) for(i in 1:1000) { tmpfit = optim(c(1,2,0.5), function(theta) NlogL(theta, tmparr2[,2,i],tmparr2[,1,i]), lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T, method="L-BFGS-B") MLEarr2[i,,] = c(tmpfit$par, sqrt(diag(solve(tmpfit$hess)))) } > round( rbind(empirical = apply(MLEarr2[,,1],2,sd), hessian.based = apply(MLEarr2[,,2],2,mean)),5) [,1] [,2] [,3] empirical 0.00031 0.00227 0.01538 hessian.based 0.00023 0.00230 0.01579 > hist(MLEarr2[,1,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr2[,1,1]),mean(MLEarr2[,1,2])), add=T, col="red") ### Better, but still far from normal ! > alph2 = sort(MLEarr2[,1,1]) max( abs( (1:1000)/1000 - pnorm(alph2,mean(alph2),sd(alph2)) )) [1] 0.1302623 > hist(MLEarr2[,2,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr2[,2,1]),sd(MLEarr2[,2,1])), add=T, col="red") > hist(MLEarr2[,3,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr2[,3,1]),sd(MLEarr2[,3,1])), add=T, col="red") > beta2 = sort(MLEarr2[,2,1]) max( abs( (1:1000)/1000 - pnorm(beta2,mean(beta2),sd(beta2)) )) [1] 0.02148981 > gamma2 = sort(MLEarr2[,3,1]) max( abs( (1:1000)/1000 - pnorm(gamma2,mean(gamma2),sd(gamma2)) )) [1] 0.06199218 Strikingly, it seems that we have failure of normality even here, with n=500. There may be convergence problems ?! ###======================================================================= Try this again with a better behaved nonlinear regression ! Y_i = exp(alpha+beta*X_i) + exp(gamma*X_i)*eps_i i.i.d. , i=1,...,n where (X_i,Y_i) are observed data-pairs, and X_i ~ Unif[0,1], with eps_i ~ N(0,1) independent of X_i and with parameters (alpha,beta,gamma) unknown. Start with a sample dataset: Xvec3 = runif(150) Yvec3 = rnorm(150,exp(1+2*Xvec3), exp(0.5*Xvec3)) ##--------------------------------------------------- SEVERAL MLE Methods The negative log-likelihood for theta = (alpha,beta,log(gamma)) is NlogL3 = function(theta,Y,X) -sum(dnorm(Y,exp(theta[1]+theta[2]*X),exp(X*theta[3]), log=T)) NlogL3(c(1,2,.5),Yvec,Xvec) [1] 179.355 MLE1B = optim(c(1,2,0.5), function(theta) NlogL3(theta,Yvec3,Xvec3), lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T, method="L-BFGS-B") $par [1] 0.9496362 2.0917092 0.5018207 $value [1] 250.5124 $counts function gradient 18 18 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" $hessian [,1] [,2] [,3] [1,] 6648.64130520 4737.7090029 -0.01308928 [2,] 4737.70900290 3783.7233252 0.28928662 [3,] -0.01308928 0.2892866 86.59347925 > MLE3B = nlm(function(theta) NlogL3(c(theta[1:2],theta[3]), Yvec3,Xvec3), c(1,2,0.5), gradtol = 1.e-8, hessian=T, steptol=1.e-8) $minimum [1] 250.5124 $estimate [1] 0.9496296 2.0917162 0.5018361 $gradient [1] 1.676312e-06 9.173088e-07 4.760636e-08 $hessian [,1] [,2] [,3] [1,] 6650.4454168 4739.4692429 -0.4736592 [2,] 4739.4692429 3785.6106635 -0.3693679 [3,] -0.4736592 -0.3693679 86.5789588 $code [1] 1 $iterations [1] 19 > sqrt(diag(solve(MLE3B$hess))) [1] 0.03735156 0.04950695 0.10747165 ### Repeated simulation to check approximate normality of MLEs: tmparr = array(0, c(150,2,1000)) tmparr[,1,] = runif(1.5e5) tmparr[,2,] = rnorm(1.5e5,exp(1+2*tmparr[,1,]),exp(0.5*tmparr[,1,])) MLEarr3 = array(0,c(1000,3,2)) for(i in 1:1000) { tmpfit = nlm(function(theta) NlogL3(theta, tmparr[,2,i],tmparr[,1,i]), c(1,2,0.5), hessian=T) MLEarr3[i,,] = c(tmpfit$est, sqrt(diag(solve(tmpfit$hess)))) } ### took less than a minute on my laptop > summary(MLEarr3[,,2]) V1 V2 V3 Min. :0.03068 Min. :0.04050 Min. :0.08762 1st Qu.:0.03678 1st Qu.:0.04814 1st Qu.:0.09742 Median :0.03802 Median :0.05031 Median :0.10044 Mean :0.03805 Mean :0.05033 Mean :0.10042 3rd Qu.:0.03925 3rd Qu.:0.05238 3rd Qu.:0.10330 Max. :0.04469 Max. :0.06150 Max. :0.11799 > apply(MLEarr3[,,2],2,sd)/apply(MLEarr3[,,2],2,mean) ## coefficients of variation [1] 0.05118089 0.06083265 0.04355590 ### of the hessian-based SD estimates ### This shows that the relative variability among hessian-based sd's ### is small for beta and gamma, but not for alpha. > round( rbind(empirical = apply(MLEarr3[,,1],2,sd), hessian.based = apply(MLEarr3[,,2],2,mean)),5) [,1] [,2] [,3] empirical 0.03844 0.05007 0.09801 hessian.based 0.03805 0.05033 0.10042 ### This shows that the theoretical and empirical sd's of MLE's are ### fairly close for beta,gamma, not quite so close for alpha. > hist(MLEarr3[,1,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr3[,1,1]),mean(MLEarr3[,1,2])), add=T, col="red") ## This picture shows quite good normal approximation !! > hist(MLEarr3[,2,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr3[,2,1]),sd(MLEarr3[,2,1])), add=T, col="red") > hist(MLEarr3[,3,1], nclass=50, prob=T) curve(dnorm(x,mean(MLEarr3[,3,1]),sd(MLEarr3[,3,1])), add=T, col="red") ### The beta.hat and gamma.hat histograms still look quite normal for n=150. ### As a way of measuring the distance between the emipirical distribution of the MLE's and the normal distribution with same mean and variance: > alph1 = sort(MLEarr3[,1,1]) max( abs( (1:1000)/1000 - pnorm(alph1,mean(alph1),sd(alph1)) )) [1] 0.02952506 > beta1 = sort(MLEarr3[,2,1]) max( abs( (1:1000)/1000 - pnorm(beta1,mean(beta1),sd(beta1)) )) [1] 0.02153984 > gamma1 = sort(MLEarr[,3,1]) max( abs( (1:1000)/1000 - pnorm(gamma1,mean(gamma1),sd(gamma1)) )) [1] 0.03008211