HW 10 SOLUTION     10/22/16
===========================

Consider the model

Y_i = exp(alpha+beta*X_i) + gamma*X_i*eps_i    i.i.d. , i=1,...,n

where (X_i,Y_i)  are observed data-pairs, and X_i ~ Unif[0,1],
with eps_i ~ N(0,1)  independent of X_i  and with parameters
(alpha,beta,gamma) unknown.

Start with a sample dataset:

Xvec = runif(150)
Yvec = rnorm(150,exp(1+2*Xvec), 0.5*Xvec)

##---------------------------------------------------
THE MAIN MESSAGE IN THIS PROBLEM SOLUTION IS AS FOLLOWS:

(A) AS MANY OF YOU FOUND, THERE ARE SOME DIFFICULTIES IN APPLYING  nlm  
DIRECTLY IN MAXIMIZING THE LOG-LIKELIHOOD IN THIS PROBLEM (depending 
a bit on your parameter choices and the initial values of your calls to nlm)

(B) BUT YOU CAN IN FACT ITERATE TO FAIRLY PRECISE CONVERGENCE IN 
FINDING THE MLE'S EITHER WITH nlm OR optim OR nlminb. AT THE POINTS 
OF CONVERGENCE, THE HESSIANS (of negative log-likelihood) ARE POSITIVE 
DEFINITE AND IF YOU CHECKED THE GRADIENTS IN nlm, THEY CAN BE MADE SMALL 
BY CHOICE OF gradtol AND steptol PARAMETERS.

(C) THE APPROXIMATE NORMALITY I EXPECTED YOU TO BE ABLE TO SHOW AT n=150 
DOES NOT IN FACT HOLD, ESPECIALLY FOR THE alpha-hat ESTIMATE AS SHOWN 
BY HISTOGRAMS AND , AS EVIDENCED ALSO BY THE DISCREPANCY BETWEEN THE 
EMPIRICAL SD's AND HESSIAN-BASED THEORETICAL STANDARD DEVIATONS FOR alpha-hat.
EVEN WITH A LARGER SAMPLE-SIZE, OF n=500, or n=1000 or more, THIS PHENOMENON 
WILL PERSIST, FOR A GOOD THEORETICAL REASON THAT I WILL DISCUSS IN CLASS.

(D) BOTH WITH HISTOGRAMS AND WITH OTHER MEASURES OF CLOSENESS BETWEEN THE 
NORMAL AND EMPIRICAL DISTRIBUTIONS FUNCTIONS, LIKE THE MAXIMUM DISTANCE 
AT SORTED MLE VALUES, YOU CAN SEE THAT THE APPROXIMATE NORMALITY FOR beta-hat
AND gamma-hat DOES HOLD WITH A REASONABLE APPROXIMATION at n=150.

(E) IF YOU REPEAT THE SAME STEPS WITH A MUCH BETTER-BEHAVED VARIANCE MODEL,
SUCH AS A MODEL WITH sig*exp(gamma*X_i) replacing gamma*X_i AS THE CONDITIONAL
VARIANCE OF Y_i GIVEN X_i, THEN YOU CAN FIND THAT THE APPROXIMATE NORMALITY 
OF PARAMETER MLE'S HOLDS TO A GOOD APPROXIMATION WITH n=150, AND TO A MUCH 
WEAKER LEVEL OF APPROXIMATION WITH n=50.
   Some of this last demonstration is given at the end of the solution below.

##---------------------------------------------------
   SEVERAL MLE Methods

The negative log-likelihood for theta = (alpha,beta,log(gamma)) is

NlogL = function(theta,Y,X) 
   -sum(dnorm(Y,exp(theta[1]+theta[2]*X),X*theta[3], log=T))

NlogL(c(1,2,.5),Yvec,Xvec)
[1] -23.34669

MLE1 = optim(c(1,2,0.5), function(theta) NlogL(theta,Yvec,Xvec),
   lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T,
    method="L-BFGS-B")
MLE1[1:4]
$par
[1] 0.9982743 2.0009628 0.5078612

$value
[1] -25.57573

$counts
function gradient 
      30       30 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

### So this method gives nice convergence, and SE estimates
> SE1 = sqrt(diag(solve(MLE1$hess)))
> round(SE1,5)
[1] 0.00086 0.00442 0.02932


> MLE2 = nlminb(c(1,2,0.5), function(theta) NlogL(theta,Yvec,Xvec), 
  lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3))
> MLE2
$par
[1] 0.9982748 2.0009625 0.5078611

$objective
[1] -25.57573

$convergence
[1] 0

$iterations
[1] 18

$evaluations
function gradient 
      30       68 

$message
[1] "relative convergence (4)"


> MLE3 = nlm(function(theta) NlogL(c(theta[1:2],theta[3]), Yvec,Xvec), 
      c(1,2,0.5), gradtol = 1.e-8, hessian=T, steptol=1.e-8)
$minimum
[1] -25.57573

$estimate
[1] 0.9982748 2.0009625 0.5078613

$gradient
[1] -6.642438e-06  1.304616e-05  1.960920e-07

$hessian
             [,1]        [,2]       [,3]
[1,] 1518498.8894 99132.50042 -298.82084
[2,]   99132.5004 57574.37390  -22.66714
[3,]    -298.8208   -22.66714 1161.99341

$code
[1] 3

$iterations
[1] 18


### Repeated simulation to check approximate normality of MLEs:

tmparr = array(0, c(150,2,1000))
tmparr[,1,] = runif(1.5e5)
tmparr[,2,] = rnorm(1.5e5,exp(1+2*tmparr[,1,]),0.5*tmparr[,1,])
MLEarr = array(0,c(1000,3,2))
for(i in 1:1000) {
       tmpfit = optim(c(1,2,0.5), function(theta) NlogL(theta,
           tmparr[,2,i],tmparr[,1,i]), lower=c(-Inf,-Inf,1.e-6), 
           upper=rep(Inf,3), hessian=T, method="L-BFGS-B")
       MLEarr[i,,] = c(tmpfit$par, sqrt(diag(solve(tmpfit$hess)))) }
### took less than a minute on my laptop

> summary(MLEarr[,,2])
       V1                  V2                 V3         
 Min.   :1.239e-06   Min.   :0.003096   Min.   :0.02146  
 1st Qu.:2.957e-04   1st Qu.:0.004084   1st Qu.:0.02757  
 Median :6.616e-04   Median :0.004298   Median :0.02866  
 Mean   :7.679e-04   Mean   :0.004385   Mean   :0.02870  
 3rd Qu.:1.122e-03   3rd Qu.:0.004630   3rd Qu.:0.02986  
 Max.   :3.249e-03   Max.   :0.006192   Max.   :0.03508  

> apply(MLEarr[,,2],2,sd)/apply(MLEarr[,,2],2,mean)
[1] 0.76814320 0.10134776 0.05802194

### This shows that the relative variability among hessian-based sd's
###  is small for beta and gamma, but not for alpha.

> round( rbind(empirical = apply(MLEarr[,,1],2,sd), 
      hessian.based = apply(MLEarr[,,2],2,mean)),5)
                 [,1]    [,2]    [,3]
empirical     0.00095 0.00445 0.02771
hessian.based 0.00077 0.00438 0.02870

### This shows that the theoretical and empirical sd's of MLE's are 
###   fairly close for beta,gamma, not quite so close for alpha.

> hist(MLEarr[,1,1],nclass=40, prob=T, main=
   "Histogram of alpha.hat MLEs in HW10 
    N=1000, n=150, alpha=1, beta=2, gamma=0.5", 
    xlab="MLEs for alpha")
  curve(dnorm(x,mean(MLEarr[,1,1]),mean(MLEarr[,1,2])), add=T, col="red")
 
   ### Picture saved as HW10Pic1.pdf

### You can see from this picture that the behavior is actually quite 
NON-normal for the alpha.hat MLE even for n=150 !!

> hist(MLEarr[,2,1], nclass=40, prob=T, main =
   "Histogram of beta.hat MLEs in HW10 
    N=1000, n=150, alpha=1, beta=2, gamma=0.5", 
    xlab="MLEs for beta")
  curve(dnorm(x,mean(MLEarr[,2,1]),sd(MLEarr[,2,1])), add=T, col="red")

   ### Picture saved as HW10Pic2.pdf

> hist(MLEarr[,3,1], nclass=50, prob=T, main = 
   "Histogram of gamma.hat MLEs in HW10 
    N=1000, n=150, alpha=1, beta=2, gamma=0.5", 
    xlab="MLEs for gamma")
  curve(dnorm(x,mean(MLEarr[,3,1]),sd(MLEarr[,3,1])), add=T, col="red")

   ### Picture saved as HW10Pic3.pdf

### But the beta.hat and gamma.hat histograms do look quite normal for n=150.

### As a way of measuring the distance between the emipirical distribution
of the MLE's and the normal distribution with same mean and variance:

> alph1 = sort(MLEarr[,1,1])
  max( abs( (1:1000)/1000 - pnorm(alph1,mean(alph1),sd(alph1)) ))
[1] 0.1205623

## This is quite a large discrepancy between the empirical and 
approximating normal cdf for alpha.MLEs. Compare this situation with that
for beta.MLEs and gamma.MLEs.

> beta1 = sort(MLEarr[,2,1])
  max( abs( (1:1000)/1000 - pnorm(beta1,mean(beta1),sd(beta1)) ))
[1] 0.02885048

> gamma1 = sort(MLEarr[,3,1])
  max( abs( (1:1000)/1000 - pnorm(gamma1,mean(gamma1),sd(gamma1)) ))
[1] 0.03008211

###-----------------------------------------------
Repeat for n=500

Xvec2 = runif(500)
Yvec2 = rnorm(500,exp(1+2*Xvec), 0.5*Xvec)

tmparr2 = array(0, c(500,2,1000))
tmparr2[,1,] = runif(5e5)
tmparr2[,2,] = rnorm(5e5,exp(1+2*tmparr2[,1,]),0.5*tmparr2[,1,])
MLEarr2 = array(0,c(1000,3,2))
for(i in 1:1000) {
       tmpfit = optim(c(1,2,0.5), function(theta) NlogL(theta,
           tmparr2[,2,i],tmparr2[,1,i]), lower=c(-Inf,-Inf,1.e-6), 
           upper=rep(Inf,3), hessian=T, method="L-BFGS-B")
       MLEarr2[i,,] = c(tmpfit$par, sqrt(diag(solve(tmpfit$hess)))) }

> round( rbind(empirical = apply(MLEarr2[,,1],2,sd), 
      hessian.based = apply(MLEarr2[,,2],2,mean)),5)
                 [,1]    [,2]    [,3]
empirical     0.00031 0.00227 0.01538
hessian.based 0.00023 0.00230 0.01579


> hist(MLEarr2[,1,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr2[,1,1]),mean(MLEarr2[,1,2])), add=T, col="red")
### Better, but still far from normal !

> alph2 = sort(MLEarr2[,1,1])
  max( abs( (1:1000)/1000 - pnorm(alph2,mean(alph2),sd(alph2)) ))
[1] 0.1302623

> hist(MLEarr2[,2,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr2[,2,1]),sd(MLEarr2[,2,1])), add=T, col="red")

> hist(MLEarr2[,3,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr2[,3,1]),sd(MLEarr2[,3,1])), add=T, col="red")

> beta2 = sort(MLEarr2[,2,1])
  max( abs( (1:1000)/1000 - pnorm(beta2,mean(beta2),sd(beta2)) ))
[1] 0.02148981

> gamma2 = sort(MLEarr2[,3,1])
  max( abs( (1:1000)/1000 - pnorm(gamma2,mean(gamma2),sd(gamma2)) ))
[1] 0.06199218

Strikingly, it seems that we have failure of normality even here, with n=500.
There may be convergence problems ?!

###=======================================================================

Try this again with a better behaved nonlinear regression !


Y_i = exp(alpha+beta*X_i) + exp(gamma*X_i)*eps_i    i.i.d. , i=1,...,n

where (X_i,Y_i)  are observed data-pairs, and X_i ~ Unif[0,1],
with eps_i ~ N(0,1)  independent of X_i  and with parameters
(alpha,beta,gamma) unknown.

Start with a sample dataset:

Xvec3 = runif(150)
Yvec3 = rnorm(150,exp(1+2*Xvec3), exp(0.5*Xvec3))

##---------------------------------------------------
   SEVERAL MLE Methods

The negative log-likelihood for theta = (alpha,beta,log(gamma)) is

NlogL3 = function(theta,Y,X) 
   -sum(dnorm(Y,exp(theta[1]+theta[2]*X),exp(X*theta[3]), log=T))

NlogL3(c(1,2,.5),Yvec,Xvec)
[1] 179.355

MLE1B = optim(c(1,2,0.5), function(theta) NlogL3(theta,Yvec3,Xvec3),
   lower=c(-Inf,-Inf,1.e-6), upper=rep(Inf,3), hessian=T,
    method="L-BFGS-B")
$par
[1] 0.9496362 2.0917092 0.5018207

$value
[1] 250.5124

$counts
function gradient 
      18       18 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

$hessian
              [,1]         [,2]        [,3]
[1,] 6648.64130520 4737.7090029 -0.01308928
[2,] 4737.70900290 3783.7233252  0.28928662
[3,]   -0.01308928    0.2892866 86.59347925


> MLE3B = nlm(function(theta) NlogL3(c(theta[1:2],theta[3]), Yvec3,Xvec3), 
      c(1,2,0.5), gradtol = 1.e-8, hessian=T, steptol=1.e-8)
$minimum
[1] 250.5124

$estimate
[1] 0.9496296 2.0917162 0.5018361

$gradient
[1] 1.676312e-06 9.173088e-07 4.760636e-08

$hessian
             [,1]         [,2]       [,3]
[1,] 6650.4454168 4739.4692429 -0.4736592
[2,] 4739.4692429 3785.6106635 -0.3693679
[3,]   -0.4736592   -0.3693679 86.5789588

$code
[1] 1

$iterations
[1] 19

> sqrt(diag(solve(MLE3B$hess)))
[1] 0.03735156 0.04950695 0.10747165


### Repeated simulation to check approximate normality of MLEs:

tmparr = array(0, c(150,2,1000))
tmparr[,1,] = runif(1.5e5)
tmparr[,2,] = rnorm(1.5e5,exp(1+2*tmparr[,1,]),exp(0.5*tmparr[,1,]))
MLEarr3 = array(0,c(1000,3,2))
for(i in 1:1000) {
       tmpfit = nlm(function(theta) NlogL3(theta,
           tmparr[,2,i],tmparr[,1,i]), c(1,2,0.5), hessian=T)
       MLEarr3[i,,] = c(tmpfit$est, sqrt(diag(solve(tmpfit$hess)))) }
### took less than a minute on my laptop

> summary(MLEarr3[,,2])
       V1                V2                V3         
 Min.   :0.03068   Min.   :0.04050   Min.   :0.08762  
 1st Qu.:0.03678   1st Qu.:0.04814   1st Qu.:0.09742  
 Median :0.03802   Median :0.05031   Median :0.10044  
 Mean   :0.03805   Mean   :0.05033   Mean   :0.10042  
 3rd Qu.:0.03925   3rd Qu.:0.05238   3rd Qu.:0.10330  
 Max.   :0.04469   Max.   :0.06150   Max.   :0.11799  

> apply(MLEarr3[,,2],2,sd)/apply(MLEarr3[,,2],2,mean)  ## coefficients of variation
[1] 0.05118089 0.06083265 0.04355590          ### of the hessian-based SD estimates

### This shows that the relative variability among hessian-based sd's
###  is small for beta and gamma, but not for alpha.

> round( rbind(empirical = apply(MLEarr3[,,1],2,sd), 
      hessian.based = apply(MLEarr3[,,2],2,mean)),5)
                 [,1]    [,2]    [,3]
empirical     0.03844 0.05007 0.09801
hessian.based 0.03805 0.05033 0.10042

### This shows that the theoretical and empirical sd's of MLE's are 
###   fairly close for beta,gamma, not quite so close for alpha.

> hist(MLEarr3[,1,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr3[,1,1]),mean(MLEarr3[,1,2])), add=T, col="red")

## This picture shows quite  good normal approximation !!

> hist(MLEarr3[,2,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr3[,2,1]),sd(MLEarr3[,2,1])), add=T, col="red")

> hist(MLEarr3[,3,1], nclass=50, prob=T)
  curve(dnorm(x,mean(MLEarr3[,3,1]),sd(MLEarr3[,3,1])), add=T, col="red")

### The beta.hat and gamma.hat histograms still look quite normal for n=150.

### As a way of measuring the distance between the emipirical distribution
of the MLE's and the normal distribution with same mean and variance:

> alph1 = sort(MLEarr3[,1,1])
  max( abs( (1:1000)/1000 - pnorm(alph1,mean(alph1),sd(alph1)) ))
[1] 0.02952506

> beta1 = sort(MLEarr3[,2,1])
  max( abs( (1:1000)/1000 - pnorm(beta1,mean(beta1),sd(beta1)) ))
[1] 0.02153984

> gamma1 = sort(MLEarr[,3,1])
  max( abs( (1:1000)/1000 - pnorm(gamma1,mean(gamma1),sd(gamma1)) ))
[1] 0.03008211