Homework Problem 8, STAT 705, Fall 2015. Assigned 10/7/2015, Due Friday 10/16/2015 Give R code for a function that generates (1) a random point from the parallelogram in the plane bounded by the lines y = 3 + 2x, y = 7 + 2x and y= 1, y=5. ----------- Solution to part (1) ------------------------- You can do this by viewing the parallelogram as the union of the square the two back-to-back right triangles with respective vertices (-3,1), (-1,1), (-1,5) and (-1,1), (-1,5), (1,5) and note that if the second of these triangles is translated to the left by precisely 2 units in the x direction, that triangle is mapped with preservation of areas to the triangle with vertices (-3,1), (-3,5), (-1,5). The union of the first triangle and this mapped second triangle is now the rectangle [-3,-1] X [1,5]. So the algorithm generating a uniformly distributed point in the original parallelogram runs this backward; in R code > X0 = runif(1,-3,-1) Y = runif(1,1,5) X = if(Y > 7+2*X0) X0+2 else X0 or, in R functional form putting K (say 1000) iid generated X,Y pairs in the rows of a matrix, > K=1000 XYmat = cbind(X= runif(K,-3,-1), Y=runif(K,1,5)) XYmat[,1] = XYmat[,1] + 2*(c(XYmat %*% c(-2,1)) >7) A more concise way to understand this is that the parallelogram is the set {(x,y): 1<=y<=5, (y-3)/2 >= x >= (y-7)/2} which is the image of the rectangle [-3,-1] X [1,5] under the linear mapping (x,y) -> (x+(y-1)/2, y), i.e. the mapping resulting from multiplication by the matrix ( 1 1/2 ) and adding (-1/2) ( 0 1 ) ( 0 ) which has Jacobian determinant 1. So the one-line generating algorithm is > XYmat = cbind(runif(K,-3,-1),runif(K,1,5)) %*% array(c(1,1/2,0,1), dim=c(2,2)) + outer(rep(1,K),c(-1/2,0),"*") Finally, here is a third method which is easier to find (involves less guesswork). First generate Y uniformly in [1,5], then given Y, generate X uniformly as Y/2 plus a uniform(-7/2,-3/2) r.v., or > Ycol = runif(K,1,5) XYmat = cbind(X=Ycol/2 + runif(K,-7/2,-3/2), Y=Ycol) Whichever method you choose, the easiest way to check it is visually: > plot(XYmat, xlab="X variate", ylab="Y variate", main= "Scatterplot of jointly generated (X,Y) random pairs") (Of course you could compare moments or other expectations to do the checking, but in this case the picture is persuasive.) -------------- HW Problem Part (2) ---------------------------------- (2) Give R code to generate a 3-dimensional random vector (X,Y,Z) with density f(x,y,z) = C x y^3 exp(-x*y) (z/(y+z))^2 , x >0, 0 < y < 1, 0 < z < 1 for a constant C that you must find. --------------- Solution to part (2) ------------------------------- METHOD 1. [Successive univariate density and conditional densities] The first thing to notice is that in this density, if Y=y is fixed, then X and Z are conditionally independent, i.e. given Y=y, X ~ Gamma(2,y) with f_{X|Y}(x|y) = y^2*x*exp(-x*y) for x>0 and Z has conditional density of the form K(y)*(1-y/(y+z))^2 for 0 C = 1/integrate(function(y) y*(1 + y/(1+y)-2*y*log(1+1/y)),0,1)$val > C [1] 9.776674 The order of random variate generation will now be: first Y generate from its marginal density > fY = function(y) C*y*(1 + y/(1+y) - 2*y*log(1+1/y)) then X ~ Gamma(2,Y), and then Z from its conditional density > fZ.Y = function(z,y) (1-y/(y+z))^2 / { 1 + y/(1+y) - 2*y*log(1+1/y) } which is bounded above on (0,1) by B2 = 8.79 on [0,1] based on > falt = function(y) (1+y/(1+y)-2*y*log(1+1/y)) max(1/falt(runif(1000))) [1] 8.786902 while fY is a very smooth density easily checked to be <= B1 = 1.163 on all of (0,1), since > max(fY(runif(1000))) [1] 1.162473 It turns out to be convenient to notice in this problem that the conditional density of W = Z/Y given Y=y is bounded by B2*y*I[w < 1/y] The R operations to do the simulations use accept-reject on BOTH Y (with upper bound B1 = 1.163 on its density) AND W = Y/X (with conditional upper bound B2=8.79) are XYZgen = function(N, B1=1.163, B2=8.79, fac=20) { ### generate a random number <= N of observations ### based on 2 successive conditional accept-reject steps U12 = array(runif(2*N), c(N,2)) Yvec = U12[,1][ U12[,2] < fY(U12[,1])/B1 ] NY = length(Yvec) U34 = array(runif(2*fac*NY), c(NY,fac,2)) ### Since Z accept-reject is wasteful, generate fac*NY proposed values condmat = 1.*(U34[,,2] < fZ.Y(U34[,,1],outer(Yvec,rep(1,fac)))/B2 ) inds = apply(condmat,1,function(row) match(1,row)) aux = !is.na(inds) Yvec = Yvec[aux] NZ = sum(aux) Xvec = rgamma(NZ,2)/Yvec Zvec = U34[cbind(1:NZ,inds[aux],1)] cbind(X=Xvec,Y=Yvec,Z=Zvec) } > tmp=XYZgen(1000) > dim(tmp) [1] 799 3 ### Check that the Y's behave correctly marginally and that Y*X ~ Gamma(2,1) > hist(tmp[,"Y"], nclass=20, prob=T) curve(fY(x),0,1, add=T, col="red") ### agreement roughly OK > hist(tmp[,"Y"]*tmp[,"X"], nclass=20, prob=T) curve(dgamma(x,2), add=T, col="blue") ### very nice agreement METHOD 2. Direct Accept-Reject for the pair (Y,Z). The joint fYZ density > fYZ = function(y,z) C*y*z^2/(y+z)^2 is bounded on [0,1]X[0,1] by C/4 = 2.4442. > finner = function(y) { m = length(y); out=numeric(m) for (k in 1:m) out[k] = integrate(function(z,.y) fYZ(.y,z),0,1,.y=y[k])$val out } > curve(fY(x),0,1) > curve(finner(x),0,1, add=T, col="blue") ### perfectly overlaid curves So a direct 2-variate accept reject works here, and is actually less wasteful than the two 1-variate accept-reject methods in METHOD 1. XYZgen2 = function(N) { UV = array(runif(3*N),c(N,3), dimnames=list(NULL,c("Y","Z",""))) YZmat = UV[,1:2] [ UV[,3] < fYZ(UV[,1],UV[,2])/2.4442, ] Xvec = rgamma(nrow(YZmat),2,YZmat[,1]) cbind(X=Xvec, YZmat) } > tmp = XYZgen2(1000) > dim(tmp) [1] 420 3 ### so only 420 of the accept-reject steps "worked" ### Check that the Y's behave correctly marginally and that Y*X ~ Gamma(2,1) > hist(tmp[,"Y"], nclass=20, prob=T) curve(fY(x),0,1, add=T, col="red") ### agreement roughly OK > hist(tmp[,"Y"]*tmp[,"X"], nclass=20, prob=T) curve(dgamma(x,2), add=T, col="blue") ### very nice agreement > hist(XYZgen2(1e5)[,"Y"], nclass=32, prob=T) curve(fY(x),0,1, add=T, col="red") ### agreement much better !!