Homework Solutions to Problems 9--10. =================================== [NB solutions are in Splus 6.0] (9) (a) The region of interest is the set in the place with x-coordinates ranging from -5 to 5, and with y ranging from 0 to min((10 +2x)/3, 5 - x). It is easy to see that the density of X is not uniform but rather is proportional to the height (max y-coordinate) of the triangle for each value x in [-5,5]. The constant of proportionality is the reciprocal of the area of the triangle, ie fX(x) = .05*min((10+2x)/3, 5 - x). The corresponding df is FX(x) = (.05/3)*(x+5)^2 for x<1, and = 1 - .025*(5-x)^2 for 1<= x < 5. Thus: > XYfcn <- function(N) { Xval <- numeric(N) U1val <- runif(N) ind1 <- U1val <= .6 Xval[ind1] <- sqrt(60*U1val[ind1])-5 Xval[!ind1] <- 5-sqrt(40*(1-U1val[!ind1])) Yval <- pmin(10/3 + (2/3)*Xval, 5 - Xval)*runif(N) list(Xval=Xval, Yval=Yval) } An alternate method here is to generate distributions in the rectangle [-5,5] x [0,4], and reflect observations falling outside the triangle across the lines y=10/3 + (2/3)*x if x<1 and across y=5-x if x>1, to obtain points in the tirangle in every case: > XYfcn2 <- function(N) { Xmat <- cbind(-5 + 10*runif(N), 4*runif(N)) inds1 <- Xmat[,1] < 1 & Xmat[,2] > 10/3 + (2/3)*Xmat[,1] Xmat[inds1,] <- outer(rep(2,sum(inds1)), c(-2,2),"*") - Xmat[inds1,] inds2 <- Xmat[,1] >= 1 & Xmat[,2] > 5-Xmat[,1] Xmat[inds2,] <- outer(rep(2,sum(inds2)), c(3,2),"*") - Xmat[inds2,] list(Xval=Xmat[,1], Yval = Xmat[,2]) } tmp2 <- XYfcn2(1000) When plotted, both of the two 1000-vectors look uniform over the triangle. For a more precise test, we do a chi-square with cells defined by x-values in intervals [-5,-3], [-3,-1], [-1,1], [1,3], [3,5] and with y-values in intervals [0,2] and [2,4]. > t1 <- table((tmp$Xval+1) %/% 2 , (tmp$Yval %/% 2)) > t1 0 1 -2 58 0 -1 180 18 0 196 129 1 218 110 2 91 0 > t2 <- table((tmp2$Xval+1) %/% 2 , (tmp2$Yval %/% 2)) t2 0 1 -2 63 0 -1 181 14 0 184 146 1 195 106 2 111 0 ** The theoretical areas of the intersections of the desired triangle with the rectangular cells, expressed as a fraction of the area 20 of the whole triangle, are as follows: > ea <- round(.05*matrix(c(2*.5*(4/3),0,2*(.5*(5/3)+.5*(6/3)), 1/3, 4, 8/3, 4, 2, 2, 0), byrow=T, ncol=2)*1000,2) > ea [,1] [,2] [1,] 66.67 0.00 [2,] 183.33 16.67 [3,] 200.00 133.33 [4,] 200.00 100.00 [5,] 100.00 0.00 ### Now for the 2 chi-square statistics on 7 df : > c(sum(((c(t1)-c(ea))^2/c(ea))[c(ea)>0]), sum(((c(t2)-c(ea))^2/c(ea))[c(ea)>0])) [1] 4.944696 4.838281 #### OK !!! ### The degrees of freedom are 7 because the ### geometry of the triangle is such that there are ### exactly 8 cells with non-zero probabilities. (b) The rejection method is easier to program. Note that since we should reject about half of the random numbers, to get 1000 pairs it should suffice to generate 2200. (To check this, note that > pbinom(2000,2200,.5) ### = 1 More generally, we propose to generate 400 + 2.2*N pairs. > XYfcn3 <- function(N) { naux <- ceiling(400+2.2*N) Xmat <- cbind(-5+10*runif(naux), 4*runif(naux)) inds <- (1:naux)[(Xmat[,1] < 1 & Xmat[,2] < 10/3+ (2/3)*Xmat[,1]) | (Xmat[,1] >= 1 & Xmat[,2] < 5-Xmat[,1])] list(Xval=Xmat[inds[1:N],1], Yval=Xmat[inds[1:N],2]) } tmp3 <- XYfcn3(1000) > sum(((c(table((tmp3$Xval+1) %/% 2 , (tmp3$Yval %/% 2))) - c(ea))^2/c(ea))[c(ea)>0]) ### now chisq = 5.727582 (10) Here we do not have the luxury of analytically known Finv or even F . First evaluate numerical integral via Simpson's rule, then invert using the smoothing-spline trick. ### First tabulate cdf at points k/10, k=0:100 > xpt <- (0:200)/10 fpt <- 0.3851565*log(1+xpt)*xpt^2*exp(-xpt) fpt[1] <- 0 Fpt <- pmin((6/60)*cumsum(fpt)[seq(1,201,2)],1) xpt <- xpt[seq(1,201,2)] tspl <- smooth.spline(Fpt,xpt, spar=1.e-6) ### Now with this pre-processing done: to generate 1000 desired ### points: > Xvec <- predict.smooth.spline(tspl, runif(1000))$y ### density looks bimodal!! But there is a unique root for its ### derivative, near 2. > summary(Xvec) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2057 2.0470 3.0320 3.3280 4.2550 11.9500 > hist(Xvec,nclass=36, prob=T) > lines((0:200)/10, fpt) ### very nice fit !! ## To solve part (a) as requested, in a function, can take > ftmp1 <- function(uvec) predict.smooth.spline(tspl, uvec)$y > Xvec2 <- predict.smooth.spline(tspl, runif(1000))$y > summary(Xvec2) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2039 2.1330 3.1960 3.5330 4.5120 12.9600 ### looks very much the same ### For both cases, let's check that we get the right relative ### frequency on (3.5,7): > integrate(function(x) 0.3851565*log(1+x)*x^2*exp(-x), 1.e-6,7)$integral ### = 0.9493278 > integrate(function(x) 0.3851565*log(1+x)*x^2*exp(-x), 1.e-6,3.5)$ value integral ### = 0.5621234 > 0.9493278-0.5621234 [1] 0.3872044 > c(sum(Xvec < 7 & Xvec > 3.5), sum(Xvec2 < 7 & Xvec2 > 3.5))/1000 [1] [1] 0.355 0.399 ### close enough to the theoretical value 0.387 ### For accept/reject method, we must find a density g(x) from which we know how to simulate, such that f(x) <= k*g(x) for a not-too-large constant k. The simplest way is to use the inequality log(1+x) < x to take g(x) proportional to x^3*exp(-x), i.e., Gamma(4,1). ### So g(x) = (1/6)*x^3*exp(-x), and f(x) <= 6c g(x), and k=6*c = 2.310939. > XsimAcc <- function(N, fac=4) { ### Simulate fac*N + 300 variates naux <- ceiling(fac*N+300) Xaux <- rgamma(naux,4) Uaux <- runif(naux)*2.310939 Xaux[(1:naux)[Uaux < 2.310939*log(1+Xaux)/Xaux]][1:N] } > Xvec3 <- XsimAcc(1000, 5) summary(Xvec3) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3199 2.2220 3.1540 3.4660 4.4400 13.3500 > integrate(function(x) 0.3851565*log(1+x)*x^2*exp(-x), 1.e-6,2)$integral [1] 0.2051833 > integrate(function(x) 0.2600396*(1+log(x))*x^2*exp(-x), 1.e-6,4)$integral [1] 0.661278 > integrate(function(x) 0.2600396*(1+log(x))*x^2*exp(-x), 1.e-6,8)$integral [1] 0.9754157 ### So let's do chisquare (with 3df) on all three generated ## X-vectors. > evec <- diff(c(0,0.2051833 , 0.661278, 0.9754157, 1))*1000 > evec [1] 205.1833 456.0947 314.1377 24.5843 > sum((hist(Xvec, breaks=c(0,2,4,8,20), plot=F)$counts-evec)^2/evec) [1] 9.732789 ### NB p-value = 0.0210 > sum((hist(Xvec2, breaks=c(0,2,4,8,20), plot=F)$counts-evec)^2/evec) [1] 5.10709 > sum((hist(Xvec3, breaks=c(0,2,4,8,20), plot=F)$counts-evec)^2/evec) [1] 4.73794 > hist(Xvec3, breaks=c(0,2,4,8,20), plot=F)$counts [1] 191 487 303 19 ### For the timing of the methods, let's generate 1e6 variates: > unix.time(xtmp <- predict.smooth.spline(tspl, runif(1e6))$y) [1] 3.07 0.39 5.75 0.00 0.00 > unix.time(xtmp2 <- XsimAcc(1e6, 2.5)) ### This should be a large enough factor for such a large ### number of variates: ### by LLN, any factor > k =2.310939 should work! [1] 10.35 0.97 27.61 0.00 0.00 ### The accept-reject scheme is much less efficient, both because of the many extra random numbers being generated and because of the mathematical operations like log's.(The predict.spline.smooth evaluation is just linear algebra. NOTE that the elapsed-time value (the third of the times produced) is not a good measure of efficiency when you are using a shared-resource network. But the CPU time IS.