HW 16 Stat 705  Fall 2015

Assigned Monday 11/23/15 DUE Friday 12/4/15

Use (repeated, randomized) data splitting to assess the RMSE in your 
final model you fitted to the Concrete data in HW13.[If you did not 
achieve a final model you want to continue with, use the one I supplied 
in my solution to HW13.]

Obtain the RMSE another way by a parametric-bootstrap simulation.
(But for this method, regard the predictive X variables as fixed, 
and simulate new Y's given X using the parametric bootstrap with 
your model. 

Pay some attention to the sampling variability (simulation "errors") 
in each method in deciding how many replications you need for each.

Which one of the two methods (cross-validation by data-splitting, or 
parametric bootstrap) should better represent what you might see in 
a future data-set with a data-generating mechanism very similar to the 
one that underlies the Concrete data ? What assumptions are you making 
for the validity of each method ?

Can you think of any other way to assess the mean-squared prediction 
error of analysis by the specific model in this data setting ? 
(Hint: you might try a nonparametric bootstrap.) What assumptions does 
it require for it to give a large-sample-valid estimate of RMSE ?