HW 15 Stat 705 Fall 2015 Assigned Wednesday 11/18/15 DUE Monday 11/30/15 (A) Read in the dataset "HW15data.txt" in the Data directory of the course web-page. It consists of 200 pairs of X_i,Y_i values, with -2 < X_i < 2. Fit the following curves y = g(x) to these data: (i) a polynomial regression of degree 4, (ii) a cubic smoothing spline with all points as knots, (iii) a B-spline regression with degree 3 and knots at -1,1 (iv) a local polynomial regression of degree 3. You may use the defaults in the R functions you call to do the fitting, but you should say what they are on these data. ALSO: find a way to display the raw scatterplot along with the 4 fitted curves in one or two pictures. Hand in the R code for your curve fits and the pictures. (B) Show how to simulate arbitrarily many pseudo-random variable values from the density (on the whole real line) given by f(x) = C *( exp(-x^2) + 2*exp(-2*abs(x-1)))*(1+x^2)^2 where C is a constant for you to find. Do the simulation by numerically approximating the inverse. Show in some way that your code actually does generate data from the desired density, and time a run in which you generate 10^5 simulated random variable values from this density. ......................Solutions........................... (A) ## Data were generated as follows: xtmp = rbeta(200,2,2)*4-2 ytmp = log(1+2*xtmp^2-3*xtmp^3+4*xtmp^4) + 2*rnorm(200) res = lm(ytmp ~ xtmp + I(xtmp^2))$resid Datatmp = cbind(x=xtmp, y=res) write.table(Datatmp, "c:/Deptstuff/CoursesSt705/HomeworkF15/HW15data.txt", row.names=F) plot(Datatmp, xlab="X", ylab="Y", ylim=c(-2.2,1.5), xlim=c(-2,2.4), main="Y on X scatterplot and overplotted curves") ordx = order(Datatmp$x) ## (a) > auxfr = cbind.data.frame(y=Datatmp$y, outer(Datatmp$x, 0:4, function(x,y) x^y)) names(auxfr)[2:6] = c("Ones", paste("xpow",1:4,sep="")) names(auxfr) [1] "y" "Ones" "xpow1" "xpow2" "xpow3" "xpow4" lines(Datatmp$x[ordx], lm(y~., data=auxfr)$fitted[ordx], col="black") ## (b) > lines(smooth.spline(Datatmp$x, Datatmp$y),col="blue") ## (c) > library(splines) auxfr2 = cbind.data.frame(y=Datatmp$y, bs(Datatmp$x, knots=c(-1,1))) lines(Datatmp$x[ordx], lm(y~., data=auxfr2)$fitted[ordx], col="red") ## (d) > library(KernSmooth) lines(locpoly(Datatmp$x, Datatmp$y, degree=3, bandwidth = 0.4), col="orange") > legend(locator(), legend=c("(a) poly.reg","(b) smoothing spline", "(c) B-spline", "(d) loc.poly.reg."), lty=rep(1,4), col=c("black","blue","red","orange")) ## Saved picture as "OverPltHW15.pdf" (B) > C = 1/integrate(function(x) (exp(-x^2)+2*exp(-2*abs(x-1)))* (1+x^2)^2, -Inf,Inf)$val [1] 0.04188614 ## First step is to generate lookup table for log df values at ## finely spaced points > unix.time({ xvals = c((-10):(-3), seq(-3+.2, 6, by=0.2), 7:10) fdens = function(x) (exp(-x^2) + 2*exp(-2*abs(x-1)))*(1+x^2)^2 dfval = numeric(length(xvals)) for(i in 2:length(xvals)) dfval[i] = integrate(fdens,xvals[i-1],xvals[i])$val dfval[1] = integrate(fdens,-Inf,xvals[1])$val dfval = C*cumsum(dfval) logdf = log(dfval) finvappr = smooth.spline(logdf,xvals) }) user system elapsed 0.01 0.00 0.02 ### fairly trivial time unix.time({ ranvec = predict(finvappr, log(runif(1e5)))$y }) user system elapsed 0.12 0.02 0.14 > hist(ranvec, nclass=80, prob=T) curve(C*fdens(x), -5,9, add=T, col="red") #### visually perfect agreement