Continuation of HANDOUT on Binomial Approximation and Quality
of Estimation in the Context of a Political Opinion Poll
============================================================
SAMPLE PROBLEM: for voting population N=10^7 in a certain state,
suppose that D = number who prefer Bush to any likely Democrat is 52%
of N, i.e., D = 5.2e6, and suppose we draw a random sample of 400 from
the population. The question is: what is the probability that the poll
results on the 400 people sampled gives exactly the wrong answer,
i.e. what is the probability that the number X of the 400 who say they
prefer Bush is less than 50% , i.e. less than 0.5*400 = 200 ?
First step is to say that sampling WITH or WITHOUT replacement from
sucha a large population (400 << 10^7) makes virtually no difference,
so that the probability that X <= 199 which is exactly
Hypergeometric(1.e7, 5.2e6, 400), is identical (up to high accuracy)
to Binom(400, 0.52).
UP TO THIS POINT, THIS IS CLOSE TO THE SAME PROBLEM CONSIDERED IN THE
10/22/03 HANDOUT ON NORMAL APPROXIMATION TO BINOMIAL PROBABILITIES.
We found there that the probability that X <= 192 is actually
around 0.06, either exactly or via the normal approximation. Now
we can say also that the probability that X < 200 is roughly 0.197,
which is uncomfortably large !
It is clear from this calculation that we are solving a problem about
the precision of the statistical sampling-based estimate X/n = X/400
for the population parameter D/N = 0.52 . We will vary the
sample-size n now and discuss how large it should be chosen for the
estimator X/n achieve various levels of precision.
The whole topic is based on treating X/n as a random variable, using
X ~ Hypergeometric(10^7, 0.52*10^7, n) which is essentially the
same as Binomial(n, 0.52), since when N is so much larger than n,
sampling with or without replacement will almost certainly result in
the same sample.
So approximately X ~ Normal (n*0.52, n*0.52*0.48), which means that
X/n ~ Normal(0.52, 0.2496/n). We quote the variability of this
estimator by saying
THE STANDARD ERROR OF X/n IS SQRT(.2496/n)
Another way of giving this information is:
P( | X/n - 0.52 | <= b* SQRT(.2496/n) )
= Phi(b) - Phi(-b) = 2*Phi(b)-1
This tells us: if we want to be able to say the probability is
1 - alpha or better that |X/n - 0.52 | <= b* sqrt(.2496/n)
we choose b so that 2*Phi(b)-1 = 1-alpha, or: Phi(b) = 1-alpha/2.
Here is a little table of values for the half-width b*sqrt(.2496/n)
of a "Confidence Interval" for the true but unknown population
proportion (0.52 here) around the estimated value X/n .
alpha b n Interval half-width
.02 2.326 100 .116
400 .058
1000 .037
.05 1.960 100 .098
400 .049
1000 .031
.10 1.645 100 .082
400 .041
1000 .026
NOTE that none of these opinion polls with such values of n
have very definitive accuracy in declaring the majority
view of voters !!