Homework Problem 8, STAT 705, Fall 2015.

Assigned 10/7/2015,  Due Friday 10/16/2015

Give R code for a function that generates

(1) a random point from the parallelogram in the plane bounded by 
the lines

     y = 3 + 2x,     y = 7 + 2x

and  y= 1,    y=5.

----------- Solution to part (1) -------------------------

You can do this by viewing the parallelogram as the union of the square the two 
back-to-back right triangles with respective vertices (-3,1), (-1,1), (-1,5)
and (-1,1), (-1,5), (1,5) and note that if the second of these triangles is 
translated to the left by precisely 2 units in the x direction, that triangle 
is mapped with preservation of areas to the triangle with vertices (-3,1), 
(-3,5), (-1,5). The union of the  first triangle and this mapped second 
triangle is now the rectangle [-3,-1] X [1,5]. So the algorithm generating 
a uniformly distributed point in the original parallelogram runs this backward; 
in R code

> X0 =  runif(1,-3,-1)
  Y = runif(1,1,5)
  X = if(Y > 7+2*X0) X0+2 else X0

or, in R functional form putting K (say 1000) iid generated X,Y pairs in 
the rows of a matrix,

> K=1000
  XYmat = cbind(X= runif(K,-3,-1), Y=runif(K,1,5))
  XYmat[,1] = XYmat[,1] + 2*(c(XYmat %*% c(-2,1)) >7)

A more concise way to understand this is that the parallelogram is the set 

{(x,y): 1<=y<=5, (y-3)/2 >= x >= (y-7)/2}

which is the image of the rectangle [-3,-1] X [1,5] under the linear mapping

(x,y) -> (x+(y-1)/2, y), i.e. the mapping resulting from multiplication 
by the matrix ( 1   1/2 ) and adding (-1/2)
              ( 0   1   )            ( 0  )

which has Jacobian determinant 1. So the one-line generating algorithm is

> XYmat = cbind(runif(K,-3,-1),runif(K,1,5)) %*% array(c(1,1/2,0,1), 
              dim=c(2,2)) + outer(rep(1,K),c(-1/2,0),"*")

Finally, here is a third method which is easier to find (involves less guesswork).
First generate Y uniformly in [1,5], then given Y, generate X uniformly as Y/2 
plus a uniform(-7/2,-3/2) r.v., or 

> Ycol = runif(K,1,5)
  XYmat = cbind(X=Ycol/2 + runif(K,-7/2,-3/2), Y=Ycol) 

Whichever method you choose, the easiest way to check it is visually:

> plot(XYmat, xlab="X variate", ylab="Y variate", main=
   "Scatterplot of jointly generated (X,Y) random pairs")

(Of course you could compare moments or other expectations to do the checking,
but in this case the picture is persuasive.)

-------------- HW Problem Part (2) ----------------------------------

(2) Give R code to generate a 3-dimensional random vector (X,Y,Z) with density

f(x,y,z) =  C x y^3 exp(-x*y) (z/(y+z))^2 , x >0, 0 < y < 1, 0 < z < 1

for a constant C that you must find.

--------------- Solution to part (2) -------------------------------

METHOD 1. [Successive univariate density and conditional densities]

The first thing to notice is that in this density, if Y=y is fixed, then X and 
Z are conditionally independent, i.e. given Y=y,

X ~ Gamma(2,y)  with f_{X|Y}(x|y) = y^2*x*exp(-x*y)   for  x>0

and 

Z  has conditional density of the form  K(y)*(1-y/(y+z))^2  for 0<z<1

and we easily check (by making this conditional density integrate to 1 in z) that

K(y) = 1/ { 1 + y/(1+y) - 2*y*log(1+1/y) }

Now numerical integration gives C = 1/ int_0^1 y { 1 + y/(1+y) - 2*y*log(1+1/y) } dy

> C = 1/integrate(function(y) y*(1 + y/(1+y)-2*y*log(1+1/y)),0,1)$val
> C
[1] 9.776674

The order of random variate generation will now be: first Y generate from its 
marginal density

> fY = function(y) C*y*(1 + y/(1+y) - 2*y*log(1+1/y))

then X ~ Gamma(2,Y), and then Z from its conditional density

> fZ.Y = function(z,y) (1-y/(y+z))^2 / { 1 + y/(1+y) - 2*y*log(1+1/y) } 

which is bounded above on (0,1) by B2 = 8.79  on [0,1] based on 
> falt = function(y) (1+y/(1+y)-2*y*log(1+1/y))
  max(1/falt(runif(1000)))
[1] 8.786902
while  fY is a very smooth density easily checked to be <= B1 = 1.163 on all of (0,1),
since
> max(fY(runif(1000)))
[1] 1.162473

It turns out to be convenient to notice in this problem that the conditional density 
of W = Z/Y given Y=y is bounded by B2*y*I[w < 1/y]

The R operations to do the simulations use accept-reject on BOTH Y (with upper 
bound B1 = 1.163 on its density) AND W = Y/X (with conditional upper bound B2=8.79) 
are

XYZgen = function(N, B1=1.163, B2=8.79, fac=20) {
    ### generate a random number <= N of observations 
    ###  based on 2 successive conditional accept-reject steps
    U12 = array(runif(2*N), c(N,2))
    Yvec = U12[,1][  U12[,2] < fY(U12[,1])/B1 ]
    NY = length(Yvec)
    U34 = array(runif(2*fac*NY), c(NY,fac,2))
    ### Since Z accept-reject is wasteful, generate fac*NY proposed values
    condmat = 1.*(U34[,,2] < fZ.Y(U34[,,1],outer(Yvec,rep(1,fac)))/B2 )
    inds = apply(condmat,1,function(row) match(1,row))
    aux = !is.na(inds)
    Yvec = Yvec[aux]
    NZ = sum(aux)
    Xvec = rgamma(NZ,2)/Yvec
    Zvec = U34[cbind(1:NZ,inds[aux],1)]
    cbind(X=Xvec,Y=Yvec,Z=Zvec) 
    }

> tmp=XYZgen(1000)
> dim(tmp)
[1] 799   3

### Check that the Y's behave correctly marginally and that Y*X ~ Gamma(2,1)

> hist(tmp[,"Y"], nclass=20, prob=T)
  curve(fY(x),0,1, add=T, col="red")     ### agreement roughly OK

> hist(tmp[,"Y"]*tmp[,"X"], nclass=20, prob=T)
  curve(dgamma(x,2), add=T, col="blue")      ### very nice agreement 


METHOD 2. Direct Accept-Reject for the pair (Y,Z).

The joint fYZ density

> fYZ = function(y,z) C*y*z^2/(y+z)^2  is bounded on [0,1]X[0,1] by C/4 = 2.4442.     


> finner = function(y) {
       m = length(y); out=numeric(m)
       for (k in 1:m) out[k] = 
           integrate(function(z,.y) fYZ(.y,z),0,1,.y=y[k])$val
       out  }
> curve(fY(x),0,1)
> curve(finner(x),0,1, add=T, col="blue")   ### perfectly overlaid curves


So a direct 2-variate accept reject works here, and is actually less wasteful 
than the two 1-variate accept-reject methods in METHOD 1.

XYZgen2 = function(N) {
    UV = array(runif(3*N),c(N,3), dimnames=list(NULL,c("Y","Z","")))
    YZmat = UV[,1:2] [ UV[,3] < fYZ(UV[,1],UV[,2])/2.4442, ]
    Xvec = rgamma(nrow(YZmat),2,YZmat[,1])
    cbind(X=Xvec, YZmat)
    }

>  tmp = XYZgen2(1000)
> dim(tmp)
[1] 420   3        ### so only 420 of the accept-reject steps "worked"

### Check that the Y's behave correctly marginally and that Y*X ~ Gamma(2,1)

> hist(tmp[,"Y"], nclass=20, prob=T)
  curve(fY(x),0,1, add=T, col="red")     ### agreement roughly OK

> hist(tmp[,"Y"]*tmp[,"X"], nclass=20, prob=T)
  curve(dgamma(x,2), add=T, col="blue")      ### very nice agreement 

> hist(XYZgen2(1e5)[,"Y"], nclass=32, prob=T)
  curve(fY(x),0,1, add=T, col="red")     ### agreement much better !!