SOLUTION to HW Problem 4, STAT 705, Fall 2015. Assigned 9/16/2015, Due Monday 9/28/2015 (a) Write a function "SimData" with the following inputs and outputs: INPUTS: xvec = either NULL (the default) or a fixed numeric vector (of arbitrary length) simx = vector of (mean,sd,length) parameters for a normally distributed SIMULATED vector xvec , to be used in case xvec is NULL errscal = error standard deviation to be used in simulating yvec coef = vector with 2 components (intercept,slope) to be used in simulating yvec OUTPUT: a list containing all the (named) components xvec, simx, errscal, coef from the INPUTs, together with a component yvec which is a vector of the same length as xvec , simulated to follow an Exponential distribution with hazard rate-parameter lambda[i] (reciprocal of mean) of yvec[i] equal to log(lambda[i]) = coef[1] + coef[2]*xvec[i] + errscal*epsilon[i] where epsilon is a vector with pseudo-random independently identically logistic-distributed entries vector of the same length as xvec, with mean 0 and scale-parameter 1. #------ Solution R-code, part (a) ---------------------------- SimData = function(xvec=NULL, simx, errscal, coef) { if(is.null(xvec)) xvec = rnorm(simx[3], simx[1], simx[2]) m = length(xvec) lambda = exp( coef[1]+coef[2]*xvec+ errscal*rlogis(m)) yvec = rexp(m)/lambda ### NOTE: there is no harm in making lambda an output list component, ### and this also fixes the epsilon's for purposes of checking list(xvec=xvec, simx=simx, errscal=errscal, coef=coef, lambda=lambda, yvec=yvec) } #----------------------------------------------------------- (b) Test your function in creating some datasets of moderately large size (say, 300), and show that it is working properly by calculating some appropriate summary statistics (means, covariances, residuals, histograms or whatever: the choice is yours). #------------ Solution R-code for part (b) ----------------- > tmp = SimData(NULL, c(4,1,300), 2, c(-2.2, 0.5)) > sapply(tmp,length) xvec simx errscal coef lambda yvec 300 3 1 2 300 300 > aux = hist(tmp$xvec, prob=T, nclass=20) curve(dnorm(x,tmp$simx[1],tmp$simx[2]), add=T, col="red") ### or ### mean(abs( aux$dens- dnorm(aux$mids,tmp$simx[1],tmp$simx[2]) )) [1] 0.02468499 ### using 12 histogram bars > eps = log(tmp$lambda/exp(tmp$coef[1]+tmp$coef[2]*tmp$xvec))/tmp$errscal aux = hist(eps, nclass=15, prob=T) curve(dlogis(x), add=T, col="blue") ### or ### mean(abs( aux$dens - dlogis(aux$mids) )) [1] 0.009316125 > aux = hist( tmp$yvec*tmp$lambda, class=16, prob=T) curve(dexp(x), add=T, col="green") ### or ### mean(abs( aux$dens - dexp(aux$mids) )) [1] 0.02576907 > round(cbind(obs.mean=c(eps=mean(eps), x=mean(tmp$xvec), lam.y=mean(tmp$yvec*tmp$lambda)), theor.mean = c(0, tmp$simx[1], 1), obs.2ndMom= c(mean(eps^2), mean(tmp$xvec^2), mean((tmp$yvec*tmp$lambda)^2)), theor.2ndMom = c(pi^2/3, tmp$simx[1]^2+tmp$simx[2]^2, 2)) ,5) obs.mean theor.mean obs.2ndMom theor.2ndMom eps -0.07757 0 2.75554 3.28987 x 3.89667 4 16.05515 17.00000 lam.y 1.04417 1 2.06574 2.00000 ### NOTE: > integrate(function(x) x^2*dlogis(x),-Inf,Inf)$val [1] 3.289868 > pi^2/3 [1] 3.289868 ### ### Could also do formal goodness of fit tests, but that is not always ### necessary for persuasive checking.