Math 225

Introduction to Biostatistics

Notes from Lecture #13

Confidence Intervals for a Differences in Proportions
A motivating problem. Are urban students more likely to get flu shots?
A buckets of colored balls model. We can imagine two buckets representing populations of urban students and rural students. Individuals who have had a flu shot are represented as red balls and those who have not with white. The proportions of red balls in the urban and rural population buckets are p_₁ and p_₂ respectively. We can take random samples of balls from the two buckets and count the number of red balls in each and make statistical inferences.
Confidence Intervals. A confidence interval for the difference in population proportions has the basic form of an estimate plus or minus a margin of error and an associated confidence level. The basic format of a confidence interval is below.
(estimate) ± (margin of error)
or
(estimate) ± (multiplier)(standard error)
The standard error is the standard deviation of the sampling distribution of the estimate.
Confidence Intervals for p₁ - p₂. For a difference in population proportions, the difference in the sample proportions is an obvious estimate. We denote the two sample sizes n₁ and n₂. The sample proportions are p-hat_i = X_i/n_i for i=1 and 2 where X_i is the count in the ith sample. Provided that the sample sizes are sufficiently large (observing at least five individuals of each type in each sample is a common rule of thumb) the Central Limit Theorem tells us that the sampling distribution of the difference in sample proportions is approximately normal with mean p₁ - p₂ and standard deviation sqrt( p₁(1-p₁)/n₁ + p₂(1-p₂)/n₂ ). To calculate a 95% confidence interval for example, we know that 95% of all sample proportions will be within 1.96 standard errors of the true population proportion. Therefore, we can be 95% confident that the true difference in population proportions is within 1.96 standard errors of the actual observed sample proportion. More formally, a confidence interval for a difference in population proportions takes the form
p-hat₁ - p-hat₂ ± z^* sqrt( p-hat₁(1-p-hat₁)/n₁ + p-hat₂(1-p-hat₂)/n₂ ).
Notice that we use the estimated sample proportions instead of the true population proportions in the standard error because we do not know them. The value of z^* is chosen so that the area between -z^* and z^* is the desired confidence level. Some common choices are:

Confidence Level z^*

90% 1.645

95% 1.960

99% 2.576

Example: Suppose that in a sample of 65 urban students, 52 have had a flu shot and in a sample of 65 rural students, 30 have had a flu shot. The two sample proportions are 0.800 and 0.462. (In general, use at least three significant digits for proportions and round off the final answers.)
Notice that in the urban group, 13 students have not had a flu shot and in the rural group, 35 have not. The numbers 52, 13, 30, and 35 are all greater than 5. The confidence intervals based on normal sampling distributions are valid.
A 95% confidence interval for the difference in population proportions is
(0.800 - 0.462) ± 1.96 sqrt( (0.800)(0.200)/65 + (0.462)(0.538)/65 )
or
0.34 ± 0.17
(It is generally good to round a margin of error up to two significant digits and then round the estimate to the same accuracy.) We can be 95% confident that the proportion of urban students with flu shots is between 17% and 51% higher than the proportion of rural students with flu shots.
Hypothesis Tests for a Difference in Proportions
A motivating problem. Are the proportion of HIV patients who take AZT less likely to develop AIDS than a placebo group?
In a study, 870 HIV patients were randomly assigned to two treatment regimes with 435 patients in each group. the first group received AZT while the second received a placebo. After a period of years, 17 individuals in the AZT group develop AIDS as compared to 38 individuals in the placebo group.
Hypothesis Tests. The previous problem may be tested formally with a statistical procedure called a hypothesis test. Begin by assuming that the probability of developing AIDS is the same for each group. This is our null hypothesis. The alternative hypothesis is that the probability of developing AIDS is smaller in the AZT group.
We see an observed difference of 21 individuals. The basic question is, if the two population proportions were equal, would a difference this large likely occur by chance alone?
We can measure the difference between what actually occurs and what we expect to occur by calculating the probability of seeing an outcome at least as extreme as what actually occurs if we were to do the entire experiment again assuming our original hypothesis is correct. This probability is called a p-value. The smaller the p-value, the more evidence there is that the null hypothesis is incorrect.
A hypothesis test then consists of these parts.
1. State null and alternative hypotheses.
2. Calculate a test statistic.
3. Calculate a p-value.
4. Summarize your findings in the context of the problem.
We can now apply these ideas to the example problem.
1. State hypotheses:
  H₀: p₁ = p₂
  H_a: p₁ < p₂p
2. Calculate a test statistic:
  Under our null hypothesis, the two population proportions are equal. If this is true, our best guess is that the common p-bar = (17 + 38) / (435 + 435) or p-bar = 0.0632. If the null hypothesis is true, the test statistic
  z = (p-hat₁ - p-hat₂) / sqrt( p-bar(1-p-bar)/n₁ + p-bar(1-p-bar)/n₂ ).
  plugging, we find the test statistic z = (0.0391 - 0.0874)/sqrt((0.0632)(0.9368)/435 + (0.0632)(0.9368)/435) or z = -2.93.
3. Calculate a p-value:
  The alternative hypothesis is p₁ < p₂. This is a one-sided test. The smaller the difference, the more evidence there is against the null hypothesis. If the null hypothesis is true, the probability of observing a difference as small as we actually observed is the area to the left of -2.93 under a standard normal curve, or 0.0017.
4. Summarize the findings in the context of the problem:
  If AZT were no better than a placebo, we would only see AZT do this well compared to a placebo in fewer than 2 out of 1000 experiments. This is strong evidence that AZT improves the chance of not developing AIDS.
Last modified: March 27, 2001

Bret Larget, larget@mathcs.duq.edu

Confidence Level	z^*
90%	1.645
95%	1.960
99%	2.576

Math 225

Introduction to Biostatistics

Notes from Lecture #13

Confidence Intervals for a Differences in Proportions

Hypothesis Tests for a Difference in Proportions