Homework 17 Solution. ===================== (a) Here are two methods. The simpler one is to generate iid triples by simulating first n and y (independently using the specified distributions and then x ~ Binom(n,y). Xsim1 = function(R, alpha,beta,lambda) { nv = rpois(R,lambda) yv = rbeta(R,alpha,beta) xv = rbinom(R,nv,yv) list(X = xv, Y = yv, N = nv) } > X1 = hist(Xsim1(1000,1,1,10)$X, breaks=c(-.1+(0:25)), ylim=c(0,120)) > X1$count [1] 104 94 99 91 95 78 98 96 62 58 42 29 19 13 11 6 3 2 0 [20] 0 0 0 0 0 0 ### 18 was the largest observed value ### Now do it the way suggested in the handout: > 1-ppois(25,10) [1] 1.768027e-05 ### So P(X > 25) < P(N > 25) < 2e-5 ### when lambda <= 10 ### So we can get a very good approx to the pmf by summing ### the series at the bottom of p.1 from x to 25. ### Use log-gamma and log-beta functions to maintain accuracy. > Xpmf = numeric(26) for(x in 0:25) { ltmp = x*log(10)-lbeta(1,1) irng = 0:(25-x) Xpmf[x+1]=exp(ltmp)*sum(dpois(irng,10)*exp(lgamma(1+irng)- lgamma(2+x+irng))) } ### This is the pmf: now we make it cumulative, noting sum is .999982 ### so we make 1-sum into a pmf value for 26. > Xcum = c(0,cumsum(Xpmf),1) ### Now simulation is easy because all we need to do is tally the X values in the various intervals: > Xsim2 = hist(runif(1000), breaks=Xcum, plot=F)$count > Xsim2 [1] 109 106 102 105 93 90 86 73 60 53 41 30 20 13 7 7 2 2 1 [20] 0 0 0 0 0 0 0 0 > points((0:18)+0.5,Xsim2[1:19]) ### very close correspondence with ### the previously plotted histogram ### 18 was the largest observed value also in this run by method 2 (b) ### Now do it all by Gibbs Sampling Gsamp = function(R,alpha,beta,lambda, burnin=100) { tmpX = numeric(R+burnin) n = rpois(1,lambda) y = rbeta(1,alpha,beta) x = rbinom(1,n,y) for(i in 1:R+burnin) { n = x + rpois(1,lambda*(1-y)) y = rbeta(1,alpha+x,beta+n-x) x = rbinom(1,n,y) tmpX[i]=x } tmpX[burnin+(1:R)] } X3 = hist(Gsamp(1000,1,1,10), plot=F, breaks=-.1+(0:25))$count > max((0:24)[X3>0]) [1] 15 #### already a little suspicious points((0:15)+0.5,X3, pch=18) ### Not very close at all !!! ### Now re-do the Gibbs simulation with much larger R and burn-in X3 = hist(Gsamp(1e4,1,1,10, burnin=3000), plot=F, breaks=-.1+(0:25))$count > max((0:24)[X3>0]) ### = 20 , much better points((0:20)+0.5, X3[1:21]/10, pch=5, col="red") ### much closer, but maybe still not perfect cbind(X1 = X1$count[1:20], X2=Xsim2[1:20], X3=X3[1:20]/10) X1 X2 X3 [1,] 104 109 95.8 [2,] 94 106 100.2 [3,] 99 102 107.3 [4,] 91 105 95.1 [5,] 95 93 99.2 [6,] 78 90 94.2 [7,] 98 86 92.8 [8,] 96 73 76.6 [9,] 62 60 71.1 [10,] 58 53 49.5 [11,] 42 41 39.1 [12,] 29 30 28.3 [13,] 19 20 23.3 [14,] 13 13 13.9 [15,] 11 7 8.2 [16,] 6 7 2.9 [17,] 3 2 1.2 [18,] 2 2 0.7 [19,] 0 1 0.4 [20,] 0 0 0.1 ### Just to test carefully: tmp = table(Xsim1(1e4, 1,1,10)$X)[1:20]/1e4 cbind(x=0:19, Naive=tmp, Gibbs=table(Gsamp(1e4,1,1,10, burnin=10000))[1:20]/1e4, SE = round(sqrt(tmp*(1-tmp)/1e4),4)) x Naive Gibbs SE 0 0 0.1010 0.0970 0.0030 1 1 0.1005 0.0957 0.0030 2 2 0.0932 0.0935 0.0029 3 3 0.0965 0.0993 0.0030 4 4 0.0940 0.0984 0.0029 5 5 0.0981 0.0976 0.0030 6 6 0.0890 0.0858 0.0028 7 7 0.0815 0.0741 0.0027 8 8 0.0696 0.0706 0.0025 9 9 0.0521 0.0534 0.0022 10 10 0.0408 0.0444 0.0020 11 11 0.0293 0.0304 0.0017 12 12 0.0215 0.0261 0.0015 * 13 13 0.0152 0.0149 0.0012 14 14 0.0073 0.0111 0.0009 * 15 15 0.0043 0.0029 0.0007 16 16 0.0029 0.0037 0.0005 17 17 0.0014 0.0004 0.0004 18 18 0.0009 0.0003 0.0003 19 19 0.0005 0.0003 0.0002 ### The Naive and Gibbs values are not always quite ### within tolerance: but for as far out as the ### normal CI approximation is any good, there ### are no significant differences with this number ### of burn-in steps !! X4 = Gsamp(1e4,1,1,10, burnin=10000) > acf(X4) ### This is a VERY interesting picture: ### after lag 30, autocorrelations go negative ### and stay significantly negative for a while, ### then turn significantly positive again !!! ### Independence does not seem believable until ### much larger lags !! > Xout = Gsamp(1e4,1,1,10) > plot(acf(Xout, lag.max=200), ci=.99) ### Seems to indicate that lag 50 is about long enough for ### approximate independence as shown by the autocorrelation. ### Now we try to select an "iid" sequence that way ### and then test it for "iid"-ness > Xout = Gsamp(5e4,1,1,10, burnin=1500)[(1:1000)*50] > acf(Xout, lag.max=200) ### looks good > plot.spec(spectrum(Xout)) ### looks pretty flat, as iid should