R Log to Compare Coverage for (One-sided) CI's
for a Binomial Proportion p
================================================= 3/3/10
Suppose that X ~ Binom(n,p) is the observed number of successes
in n Bernoulli(p) trials with unknown p.
The three methods we compare are: the one based on the simplest
or "naive" normal-approximation pivot (X-np)/sqrt(X*(1-X/n)); the one
based on solving quadratic inqualities to find the interval based on
the pivot (X-np)/sqrt(np(1-p)); and the "test-based method" described
For the first two see Example 4.4.3 in Bickel-Doksum, specifically
(4.4.7) and (4.4.3)-(4.4.4) on pp.237-8. For the "test-based method,
see the discussion on pp.245-6. The erratic behavior of these CI's is
discussed in a sophisticated way in the Brown, Cai, and Das Gupta
(2001) paper referenced on p.238.
-------- Note ----------
If B(k,n,p) denotes the binomial(n,p) cdf evaluated at k, and
Bbar(k,n,p)=1-B(k-1,n,p) is the probability of k or more successes,
then a straightforward differentiation shows that
(d/dt)Bbar(k,n,t) = n*b(k-1,n-1,t) > 0,
so that Bbar(k,n,t) = int_0^t dbeta(s,k,n-k+1) ds = pbeta(t,k,n-k+1)
is obtained (for k>0) as a beta integral.
-----------------------
Here are the computing formulas and computed results relating to the
performance of our confidence intervals.
(1) Normal approx -- naive
p <= U1 = (X + qnorm(1-alpha)*sqrt(X*(1-X/n))/n
(2) Normal approx -- solving quadratic inequality
za2 = qnorm(1-alpha)^2
p <= U2 = (X+za2/2 + sqrt((1-X/n)*X*za2+za2^2))/(n+za2)
(3) Test-based
The upper confidence limit as given in Bickel and Doksum
solves B(X,n,U) = alpha or Bbar(X+1,n,U)=1-alpha, which means
that pbeta(U,X+1,n-X) = 1-alpha or U = qbeta(1-alpha,X+1,n-X).
p <= U3 = qbeta(1-alpha,X+1,n-X)
>>>>>>>>>> R coding follows >>>>>>>>>>>>>>
BinUpCI = function(X,n,alpha) {
U = numeric(3)
U[1] = (X + qnorm(1-alpha)*sqrt(X*(1-X/n)))/n
za2 = qnorm(1-alpha)^2
U[2] = (X+za2/2 + sqrt((1-X/n)*X*za2+za2^2/4))/(n+za2)
U[3] = qbeta(1-alpha,X+1,n-X)
U }
> BinUpCI(37,120,.05)
[1] 0.3776752 0.3812626 0.3849185
(CI upper bounds for 37 successes in 120 trials)
Parr = array(0, c(100,4))
Parr[1:100,1] = 5:104
for(i in 5:104) Parr[i-4,2:4] = BinUpCI(i,110,.05)
### The next calculations check that the defining equations of the CI
upper-limits have been solved correctly.
> summary((Parr[,1]-110*Parr[,2])/sqrt(Parr[,1]*(1-Parr
[,1]/110)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.645 -1.645 -1.645 -1.645 -1.645 -1.645
> summary((Parr[,1]-110*Parr[,3])/sqrt(110*Parr[,3]*(1-Parr[,3])))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.645 -1.645 -1.645 -1.645 -1.645 -1.645
> summary(pbinom(Parr[,1],110,Parr[,4]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05 0.05 0.05 0.05 0.05 0.05
dimnames(Parr) = list(5:104,c("X","Norm.appr",
"Norm.quad","Test.based"))
### Now display UCL's for comparison.
Parr[seq(1,31,by=2),2:4] Here are the comparative
X Norm.appr Norm.quad Test.based upper bounds in the 3 types
5 0.0781222 0.09043396 0.09319384 of CI's with n=110 and alpha=.05
7 0.1019194 0.11335598 0.11620209 for various values of X.
9 0.1248035 0.13549345 0.13842981
11 0.1470492 0.15706467 0.16009232
13 0.1688104 0.17819761 0.18131639
15 0.1901839 0.19897501 0.20218420
17 0.2112352 0.21945401 0.22275262
19 0.2320110 0.23967604 0.24306294
21 0.2525463 0.25967223 0.26314621
23 0.2728676 0.27946665 0.28302646
25 0.2929958 0.29907829 0.30272267
27 0.3129476 0.31852246 0.32225010
29 0.3327366 0.33781161 0.34162121
31 0.3523740 0.35695600 0.36084626
33 0.3718688 0.37596419 0.37993378
35 0.3912289 0.39484330 0.39889088
### The differences across columns are not too large, but that is
because n=110 is fairly large here. Now do similar exhibit for n=77.
Parr2 = array(0, c(65,4))
Parr2[1:65,1] = 5:69
for(i in 5:69) Parr2[i-4,2:4] = BinUpCI(i,77,.05)
dimnames(Parr2) = list(5:69,c("","Norm.appr",
"Norm.quad","Test.based"))
Parr2[seq(1,31,by=2),2:4]
X Norm.appr Norm.quad Test.based
5 0.1111245 0.1274433 0.1316908
7 0.1447967 0.1595506 0.1639779
9 0.1771067 0.1904921 0.1951004
11 0.2084504 0.2205804 0.2253674
13 0.2390499 0.2499986 0.2549607
15 0.2690443 0.2788647 0.2839980
17 0.2985276 0.3072601 0.3125603
19 0.3275664 0.3352430 0.3407057
21 0.3562097 0.3628570 0.3684778
23 0.3844943 0.3901350 0.3959094
25 0.4124487 0.4171024 0.4230259
27 0.4400948 0.4437792 0.4498471
29 0.4674495 0.4701808 0.4763882
31 0.4945261 0.4963191 0.5026612
33 0.5213344 0.5222032 0.5286748
35 0.5478816 0.5478397 0.5544354
## All of this shows that the naive normal approximation UCL is
essentially the same as the test-based UCL in the first case (n=110)
but not the 2nd (n=77). The three upper-limits are progressively
larger.
## We next show for a sequence of sample sizes the coverage behavior
for these CI's for various choices of true p.
TruCovr = function(p, n, alpha) {
Parr = array(0, c(n+1,3), dimnames=list(0:n,
c("Norm.appr","Norm.quad","Test.based")))
for(i in 0:n) Parr[i+1,] = BinUpCI(i,n,alpha)
CovProb = array(0, c(length(p),3))
for(j in 1: length(p)) {
Qarr = t(p[j] <= Parr)
CovProb[j,] = Qarr %*% dbinom(0:n,n,p[j]) }
list(parr = Parr, cover = CovProb) }
OutProbs77 = round(TruCovr(seq(.05,.44,by=.01),77,.05)$cover,4)
> OutProbs77 Columns are coverage probabilities
for level .95 1-sided CI's
[,1] [,2] [,3] for respective values
[1,] 0.9027 0.9807 0.9807 p=.05
[2,] 0.8479 0.9496 0.9915 .06
[3,] 0.9125 0.9746 0.9746 .07
[4,] 0.8731 0.9515 0.9515 ...
[5,] 0.9242 0.9739 0.9739
[6,] 0.8944 0.9562 0.9562
[7,] 0.9355 0.9755 0.9755
[8,] 0.9123 0.9618 0.9618
[9,] 0.8875 0.9456 0.9780
[10,] 0.9274 0.9673 0.9673
[11,] 0.9075 0.9546 0.9546
[12,] 0.9401 0.9401 0.9724
[13,] 0.9240 0.9623 0.9623
[14,] 0.9066 0.9508 0.9508
[15,] 0.9378 0.9690 0.9690
[16,] 0.9236 0.9597 0.9597
[17,] 0.9083 0.9493 0.9746
[18,] 0.9378 0.9672 0.9672
[19,] 0.9252 0.9589 0.9589
[20,] 0.9116 0.9495 0.9735
[21,] 0.9392 0.9392 0.9668
[22,] 0.9280 0.9593 0.9593
[23,] 0.9159 0.9509 0.9509
[24,] 0.9417 0.9417 0.9674
[25,] 0.9317 0.9606 0.9606
[26,] 0.9209 0.9531 0.9531
[27,] 0.9449 0.9449 0.9686
[28,] 0.9360 0.9626 0.9626
[29,] 0.9264 0.9559 0.9559
[30,] 0.9487 0.9487 0.9704
[31,] 0.9408 0.9408 0.9651
[32,] 0.9323 0.9592 0.9592
[33,] 0.9527 0.9527 0.9527
[34,] 0.9458 0.9458 0.9679
[35,] 0.9382 0.9627 0.9627
[36,] 0.9301 0.9570 0.9570
[37,] 0.9509 0.9509 0.9509
[38,] 0.9443 0.9443 0.9663
[39,] 0.9371 0.9614 0.9614 p=.43
[40,] 0.9561 0.9561 0.9561 p=.44
NOTE that it very often happens that the quadratic-method and
test-based method produce the same results; but in other cases
the test-based method is clearly conservative (UCL too large).
OutProbs100 = round(TruCovr(seq(.05,.44,by=.01),100,.1)$cover,4)
> summary(OutProbs100)
Norm.appr Norm.quad Test.based
Min. :0.8201 Min. :0.8817 Min. :0.9005
1st Qu.:0.8715 1st Qu.:0.8942 1st Qu.:0.9094
Median :0.8820 Median :0.9076 Median :0.9176
Mean :0.8808 Mean :0.9055 Mean :0.9206
3rd Qu.:0.8960 3rd Qu.:0.9152 3rd Qu.:0.9290
Max. :0.9098 Max. :0.9434 Max. :0.9629
### So according to these calculations, the coverage probability of
the "test-based" approach is systematically too large (although
often the same as the quadratic-inequality method) and the
quadratic-inequality method is by far the best !!! But even for
it, many of the coverage probabilities which should be 0.90 are
off by .01 or more (in either direction) !!
## One last exhibit is the sequence of coverage probabilities for the
same p for a succession of sample sizes:
Covprobs = function(p, n, alpha) {
Covers = array(0, c(length(n),3), dimnames=list(n,
c("Norm.appr","Norm.quad","Test.based")))
for(i in 1:length(n)) Covers[i,] =
TruCovr(p,n[i],alpha)$cover
Covers }
> Covprobs(.12, 45:54, .1) All of these coverage probabilities
Norm.appr Norm.quad Test.based should be .90. Look at how they jump
45 0.8046628 0.9188976 0.9188976 around when p=.12 is fixed and
46 0.8183710 0.9259111 0.9259111 n ranges from 45 ... 54,
47 0.8312758 0.9323635 0.9323635 for alpha=.1.
48 0.8434063 0.9382939 0.9382939
49 0.8547928 0.9437397 0.9437397
50 0.8654664 0.8654664 0.9487358
51 0.8754588 0.8754588 0.9533156
52 0.8848016 0.8848016 0.9575104
53 0.8935266 0.8935266 0.9613494
54 0.7917925 0.9016654 0.9016654