Confidence Intervals for a Single Proportion
A motivating problem.
What is the proportion of red beads in the bucket?
A bucket contains several thousand colored beads, some of which are red.
We wish to estimate the proportion of red beads in the bucket.
If a random sample of 100 beads contains 25 red ones,
how can we use this information to estimate the proportion of red beads in the entire bucket,
and how confident can we be in our estimate?
This is a model for many problems that arise.
For example, in political polls, we could think of the red beads as representing
the individuals in favor of a specific candidate.
In the health sciences,
we could think of the red beads as representing individuals
with high blood pressure.
In a biology setting,
we could think of the red beads as representing individuals
that had been previously captured and tagged.
- Confidence Intervals.
A confidence interval is an estimate of an unknown parameter
along with a margin of error and a level of confidence.
The basic format of a confidence interval for a parameter is below.
(estimate) ± (margin of error)
or
(estimate) ± (multiplier)(standard error)
The standard error is the standard deviation of the sampling distribution of the estimate.
- Confidence Intervals for p.
For proportions, the sample proportion is an obvious estimate of the population proportion.
We will use the notation p for the population proportion,
n for the sample size,
X for the count of individuals in the sample,
and p-hat = X/n for the sample proportion.
Provided that the sample size is sufficiently large
(observing at least five individuals of each type in the sample is a common rule of thumb)
the Central Limit Theorem tells us that the sampling distribution of the sample proportion
is approximately normal with mean p and standard deviation sqrt( p(1-p)/n ).
To calculate a 95% confidence interval for example,
we know that 95% of all sample proportions will be within 1.96 standard errors
of the true population proportion.
Therefore, we can be 95% confident that the true population proportion
is within 1.96 standard errors of the actual observed sample proportion.
More formally,
a confidence interval for a proportion takes the form
p-hat ± z* sqrt ( p-hat (1 - p-hat) / n )
Notice that we use the estimate p-hat instead of the true p in the standard error
because we do not know p!
The value of z* is chosen so that the area between -z* and z*
is the desired confidence level.
Some common choices are:
Confidence Level | z* |
90% | 1.645 |
95% | 1.960 |
99% | 2.576 |
In the example data, p-hat = 25/100 = 0.25.
A 95% confidence interval for the proportion of red beads in the bucket is
0.25 ± 1.96 sqrt((0.25)(0.75)/100)
or
0.250 ± 0.087
We can be 95% confident that the proportion of red balls in the bucket
is between 16.3% and 33.7%.
It is good practice to round the margin of error to two significant digits
and to round the estimate to the same accuracy.
- The logic of confidence intervals.
Confidence intervals are based on this logical sequence.
Similar logic holds for confidence levels other than 95%.
- The sampling distribution of an estimate is approximately normal
and is centered at the value of the parameter we wish to estimate.
- Therefore, 95% of all possible estimates from random samples
are within 1.96 standard errors of the unknowm parameter value.
- Thus, we can be 95% confident that the particular estimate from our actual sample
is within 1.96 standard errors of the parameter value.
- Interpretations of confidence intervals.
A confidence interval is a statement about the location of an unknown parameter.
It is not a statement about the population.
The width of a confidence interval is based on the sampling distribution of the estimate.