Homework Problem 10, STAT 705, Fall 2015.

Assigned 10/14/2015,  Due Monday 10/26/2015


(1) Generate a single dataset of values (X_{i,j}, Y_{i,j}) , i=1,...,1000, 
j=1,..,40, according to the following distributional rules:

(*)               epsilon_{ij} are iid ~ N(0,1),  

and independent of the epsilon's,

        X_{ij} are independent and ~ Unif[-2,2]*sqrt(1+0.2*j)   

and
        Y_{ij} = 0.4 + 0.1*j + X_{ij} + epsilon_{ij}

Put your dataset Data1  (the X's and Y's) into a 1000 X 40 X 2 array.

    Generate a second 1000 X 40 X 2 dataset Data2 according to exactly the 
same method except that the values epsilon_{ij} are iid distributed 
according to t_5, Student's t distribution with 5 degrees of freedom.

(2) For both of your datasets, view the j indices as "cluster" or "stratum" 
labels, and exhibit your empirically estimated cluster means and standard 
deviations -- compared across the two datasets -- in two informative graphs 
with x-axis corresponding to the j-index.

(3) For both of your datasets simulated in (1), maximize the likelihood 
for the model
                 Y_{ij} = a + mu*j + b*X_{ij} + sigma*Z_{ij}

where X_{ij} as generated above ARE observed and part of your dataset, and
where Z_{ij} ~ N(0,1) are NOT observed in your dataset. Here the unknown 
4-dimensional statistical parameter is  (a,mu,b,sigma).

In your output, also give the estimated variance-covariance matrix for the 
jointly estimated parameters, and provide some indication that your 
likelihood maximization has converged.

STORE YOUR SEEDS OR YOUR DATASETS FOR FUTURE REFERENCE: WE WILL USE THE 
SAME DATA BY A DIFFERENT METHOD IN AT LEAST ONE FUTURE PROBLEM SET.

Provide your R code for all 3 parts, explaining (and where suitable, 
checking) what it does and explaining your outputs and what they mean.