Homework 11, Due Monday Oct 19, 2009.
=====================================

The purpose of this exercise is to verify "experimentally" the 
behavior of Maximum Likelihood Estimation in a simple setting.

Generate once for this entire exercise a fixed vector of 150 
(continuously distributed) X values from your favorite 
continuous distribution, stored in a vector "xvec".

Next fix scalar "parameter" values  a, b  in such a way that  
a + b*xvec  has mean value near 0 and range between quartiles
roughly from Q1= -1 to Q3=1.

Then generate 1000 batches of binary Y data values according 
to the "probit regression" model

Y_i = Binom(1, pnorm(a + b * X_i))    i=1,..,150

In your simulation, each batch of data  (X_i,Y_i): i=1,..,150 
is to be used to calculate MLE's  a^ , b^ ,  using either your 
own likelihood maximization routine in R or the function "glm".
(But note that if you use "glm", you need to specify the binary 
responses in yvec using a "formula" and a "binomial-family" 
with "link=probit" using syntax like:

  glm(cbind(yvec,1-yvec) ~ xvec, family=binomial(link="probit"))

The exercise task is concluded by checking that the resulting 
sets of  1000 a^ values and 1000 b^ values have approximately 
the distribution predicted by MLE theory.

But 150 is not a large sample for this kind of problem: do not 
be too surprised if there are noticeable discrepancies between 
the actual empirical distributions based on the 1000 simulation 
iterations versus the predicted normal distribution from MLE theory.

ALSO: it is not guaranteed by theory that the MLE is actually the 
only relative maximum for the log-Likelihood in probit regression. 
So you should perform some kind of check [via convergence codes 
and/or multiple starting-points for the maximization, or maybe 
checking a maximization by "optim" against that by "glm"] that 
your maximum likelihood is he genuine global maximum.

Once you have your code running for this example, you might want 
to re-run it with sample sizes 75 (where MLE theory will almost 
certainly fail) and 500 (where MLE theory should be clearly 
confirmed by your simulation).
YOU ARE NOT REQUIRED TO RE-RUN THE CODE AT THESE ALTERNATIVE 
SAMPLE SIZES: BUT YOU MAY FIND IT USEFUL IN DECIDING HOW TO 
INTERPRET YOUR RESULTS FOR SAMPLE SIZE OF 150.