Homework 11, Due Monday Oct 19, 2009. ===================================== The purpose of this exercise is to verify "experimentally" the behavior of Maximum Likelihood Estimation in a simple setting. Generate once for this entire exercise a fixed vector of 150 (continuously distributed) X values from your favorite continuous distribution, stored in a vector "xvec". Next fix scalar "parameter" values a, b in such a way that a + b*xvec has mean value near 0 and range between quartiles roughly from Q1= -1 to Q3=1. Then generate 1000 batches of binary Y data values according to the "probit regression" model Y_i = Binom(1, pnorm(a + b * X_i)) i=1,..,150 In your simulation, each batch of data (X_i,Y_i): i=1,..,150 is to be used to calculate MLE's a^ , b^ , using either your own likelihood maximization routine in R or the function "glm". (But note that if you use "glm", you need to specify the binary responses in yvec using a "formula" and a "binomial-family" with "link=probit" using syntax like: glm(cbind(yvec,1-yvec) ~ xvec, family=binomial(link="probit")) The exercise task is concluded by checking that the resulting sets of 1000 a^ values and 1000 b^ values have approximately the distribution predicted by MLE theory. But 150 is not a large sample for this kind of problem: do not be too surprised if there are noticeable discrepancies between the actual empirical distributions based on the 1000 simulation iterations versus the predicted normal distribution from MLE theory. ALSO: it is not guaranteed by theory that the MLE is actually the only relative maximum for the log-Likelihood in probit regression. So you should perform some kind of check [via convergence codes and/or multiple starting-points for the maximization, or maybe checking a maximization by "optim" against that by "glm"] that your maximum likelihood is he genuine global maximum. Once you have your code running for this example, you might want to re-run it with sample sizes 75 (where MLE theory will almost certainly fail) and 500 (where MLE theory should be clearly confirmed by your simulation). YOU ARE NOT REQUIRED TO RE-RUN THE CODE AT THESE ALTERNATIVE SAMPLE SIZES: BUT YOU MAY FIND IT USEFUL IN DECIDING HOW TO INTERPRET YOUR RESULTS FOR SAMPLE SIZE OF 150.