SOLUTION to HW Problem 4, STAT 705, Fall 2015. 
Assigned 9/16/2015,  Due Monday 9/28/2015

(a) Write a function  "SimData" with the following 
inputs and outputs:

       INPUTS:
xvec = either NULL (the default) or a fixed numeric vector 
          (of arbitrary length)
simx = vector of (mean,sd,length) parameters for a normally 
          distributed SIMULATED vector   xvec , to be used 
	  in case  xvec  is NULL
errscal = error standard deviation to be used in simulating yvec
coef  = vector with 2 components (intercept,slope) to be used 
	  in simulating yvec

       OUTPUT: a list containing all the (named) components xvec,
simx, errscal, coef from the INPUTs, together with a component yvec
which is a vector of the same length as xvec , simulated to 
follow an Exponential distribution with hazard rate-parameter lambda[i] 
(reciprocal of mean) of yvec[i] equal to
       log(lambda[i]) = coef[1] + coef[2]*xvec[i] + errscal*epsilon[i] 
where epsilon is a vector with pseudo-random independently identically
logistic-distributed entries vector of the same length as xvec, with 
mean 0 and scale-parameter 1.

#------ Solution R-code, part (a) ----------------------------

SimData = function(xvec=NULL, simx, errscal, coef) {
     if(is.null(xvec)) xvec = rnorm(simx[3], simx[1], simx[2])
     m = length(xvec)
     lambda = exp( coef[1]+coef[2]*xvec+ errscal*rlogis(m))
     yvec = rexp(m)/lambda
  ### NOTE: there is no harm in making lambda an output list component,
  ###   and this also fixes the epsilon's for purposes of checking
     list(xvec=xvec, simx=simx, errscal=errscal, coef=coef, 
          lambda=lambda, yvec=yvec) }

#-----------------------------------------------------------

(b) Test your function in creating some datasets of moderately
large size (say, 300), and show that it is working properly by
calculating some appropriate summary statistics (means, covariances, 
residuals, histograms or whatever: the choice is yours).

#------------ Solution R-code for part (b) -----------------

> tmp = SimData(NULL, c(4,1,300), 2, c(-2.2, 0.5))
> sapply(tmp,length)
   xvec    simx errscal    coef  lambda    yvec 
    300       3       1       2     300     300 

> aux = hist(tmp$xvec, prob=T, nclass=20)
  curve(dnorm(x,tmp$simx[1],tmp$simx[2]), add=T, col="red") ### or ###
  mean(abs( aux$dens- dnorm(aux$mids,tmp$simx[1],tmp$simx[2]) ))
[1] 0.02468499    ### using 12 histogram bars

> eps = log(tmp$lambda/exp(tmp$coef[1]+tmp$coef[2]*tmp$xvec))/tmp$errscal
  aux = hist(eps, nclass=15, prob=T)
  curve(dlogis(x), add=T, col="blue") ### or ###
  mean(abs( aux$dens - dlogis(aux$mids) ))
[1] 0.009316125

> aux = hist( tmp$yvec*tmp$lambda, class=16, prob=T)
  curve(dexp(x), add=T, col="green") ### or ###
  mean(abs( aux$dens - dexp(aux$mids) ))
[1] 0.02576907

> round(cbind(obs.mean=c(eps=mean(eps), x=mean(tmp$xvec), 
     lam.y=mean(tmp$yvec*tmp$lambda)), theor.mean = c(0, tmp$simx[1], 1),
       obs.2ndMom= c(mean(eps^2), mean(tmp$xvec^2), mean((tmp$yvec*tmp$lambda)^2)),
       theor.2ndMom = c(pi^2/3, tmp$simx[1]^2+tmp$simx[2]^2, 2)) ,5)
      obs.mean theor.mean obs.2ndMom theor.2ndMom
eps   -0.07757          0    2.75554      3.28987
x      3.89667          4   16.05515     17.00000
lam.y  1.04417          1    2.06574      2.00000

### NOTE:
> integrate(function(x) x^2*dlogis(x),-Inf,Inf)$val
[1] 3.289868
> pi^2/3
[1] 3.289868
###


### Could also do formal goodness of fit tests, but that is not always 
###    necessary for persuasive checking.