In one particular data set, we observe 99 offspring with the dominant trait and 45 withe the recessive trait. Does this data fit the simple genetic model?
We can examine this question with a chi-square test. The chi-square test statistic is a measure between what we observe and what we expect to see. If the discrepency is larger than expected due to chance alone, there is evidence that something else (such as more complicated genetics) is important.
The chi-square test statistic is as follows.
X2 = sum( (Oi-Ei)2 / Ei
where Oi is the observed count in the ith category and Ei is the expected count in the ith category.
Notice that the chi-square test statistic will be large if some observed counts differ greatly from the expected counts.
The chi-square test statistic will follow (approximately) a chi-square distribution if the null hypothesis (expected proportions in each category are correct) is true. The chi-square distribution with k degrees of freedom is what you get if you take k independent standard normal random variables, square them, and add them up. For this type of problem, the correct number of degrees of freedom is:
df = (# of categories) - 1 - (# of estimated parameters)
For our problem, we have two categories and no estimated parameters (we are given expected proportions of 0.75 and 0.25), so there is only one degree of freedom.In a total of 144 offspring, the expected counts are (0.75)*144 = 108 and (0.25)*144 = 36. Notice that the expected counts and the observed counts total to the same value, 144 in this case. The test statistic is
X2 = (99-108)2/108 + (45-36)2/36 = 3.00
The p-value os the area to the right of 3.00 under a chi-square distribution with 1 degree of freedom. From the table, we see this is between 0.05 and 0.10. This p-value is marginal. There is at best weak evidence of more complicated genetics. Chance alone might explain the difference between what was observed and what was expected.
Bret Larget, larget@mathcs.duq.edu