Math 225
Introduction to Biostatistics
The Binomial Distribution
Prerequisites
This lab assumes that you already know how to:
- Login, find course Web page, run S-PLUS
- Use the Commands Window to execute commands
Technical Objectives
This lab will teach you to:
- Use S-PLUS to make binomial distribution calculations.
- Load in and run a program.
- Use S-PLUS to visualize the binomial distribution.
Conceptual Objectives
In this lab you should learn to:
- Understand when the binomial distribution is an appropriate model.
- Begin to understand how probability underlies statistical inference.
The Binomial Distribution
The binomial distribution is the discrete probability distribution
which counts the number of ``successes'' in a fixed number
of independent ``trials''.
You may think of a trial as a random experiment that has two possible outcomes.
The classic example is counting the number of heads in a fixed number of coin tosses.
Many situations in the health sciences may be modeled with this distribution.
One example is random sampling from a very large population
where a variable of interest is categorical.
Example variables include smoking status, gender, survival status,
whether an individual is hypertensive or not, etcetera.
Genetics is another area where the binomial distribution is often applicable.
The Binomial Setting
The binomial distribution is appropriate when these four conditions hold:
- There are a fixed number of trials.
That is, the number of trials is determined before the trials occur.
- Each trial has two possible outcomes.
The outcome which is being counted is often called a ``success''
and the other outcome is called a ``failure''.
- Each trial has the same probability of success.
- The trials are independent,
meaning that the outcome of one trial does not affect the outcome of any other trial.
The binomial distribution is completely described by two parameters -
n
, which is the number of trials,
and p
, which is the success probability on any individual trial.
The probability that there are exactly x
successes
in n
trials with success probability p
is
Prob(exactly x successes) = n!/(x!(n-x)!) p^x (1-p)^(n-x)
Just as you can compute the mean and standard deviation of data
to measure ``center'' and ``spread'',
you can do the same for the binomial distribution.
mean = n*p
sd = sqrt(n*p*(1-p))
By the way, sqrt
is computer short hand for "square root".
In S-PLUS, the two most important functions for calculating binomial probabilities are
dbinom
which calculates the probability that exactly x
successes occur
in n
trials with success probability p
and
pbinom
which calculates the probability that x
or fewer successes occur
in n
trials with success probability p
.
The ``d'' in dbinom
refers to ``density''
and the ``p'' refers to ``probability''.
This nomenclature is more appropriate for continuous random variables,
but this is what it is.
You will also load in a local function
gbinom
to graph binomial distributions
for different parameter values.
S-PLUS help is available in this
on-line guide.
Note that you can use the mouse to highlight a command from Netscape,
switch over to S-PLUS, and paste the command into the Commands Window.
This can save on typing.
Also, you may use the arrow keys to retrieve and edit previous commands.
- Open a Commands Window.
[How?]
- Calculate a single binomial probability using
dbinom
.
To find the probability that there are exactly 2 successes in 6 trials
when the success probability is 0.4, type
> dbinom(2,6,0.4)
- You could also calculate all such binomial probabilities in one command.
> dbinom(0:6,6,0.4)
S-PLUS interprets 0:6 as the array of integers from 0 to 6.
- Find the probability of two or fewer successes using
pbinom
.
> pbinom(2,6,0.4)
- Find the probability of 2 or more successes.
Note that the sum of all binomial probabilities is one,
so that the desired probability is one minus the probability of one or fewer successes.
> 1 - pbinom(1,6,0.4)
Alternatively, we could sum up the individual binomial probabilities.
> sum(dbinom(2:6,6,0.4))
- Load in the function
gbinom
by following these steps.
- Click on the
gbinom
link above.
- Save the file onto the Desktop.
- Switch over to S-PLUS.
- Under the file menu, select Open.
- Open the file
gbinom.ssc
.
You may need to change the box ``Look in'' to Desktop
and the box ``File type'' to either all files or *.ssc files.
This opens up a Script Window.
- Under the Script menu, choose Run.
This will load the function
gbinom
into S-PLUS.
- Close the Script Window by clicking the x-button in the upper right corner.
- Graph the binomial distribution with
n=6
and p=0.4
.
> gbinom(6,0.4)
- Graph the binomial distribution with
n=100
and p=0.4
over the entire range.
> gbinom(100,0.4)
You may also scale the graph
so that the x-axis contains only high probability values
> gbinom(100,0.4,scale=T)
or graph a specified range.
> gbinom(100,0.4,low=20,high=30)
- Graph the binomial distribution with
n=30
for p
ranging from 0.1 to 0.9 (by 0.1).
> for(p in seq(0.1,0.9,0.1)){gbinom(30,p)}
Click on the Page tabs to see each of the nine graphs.
- Graph the binomial distribution with
p=0.5
for n
ranging from 10 to 100 (by 10).
> for(n in seq(10,100,10)){gbinom(n,0.5)}
Homework Assignment
Load the function
gbinom
into S-PLUS
(if it has not already been done)
and answer the questions below.
You should write your answers on
this form
and turn it in to your lab instructor by the due date.
Further S-PLUS help is available in this
on-line guide.
- A couple who are both carriers of a genetic disease
have a 0.25 probability of passing the disease on to any offspring.
If they have five children,
a random number will have the disease.
Use S-PLUS to find the binomial probability of each possible outcome.
Verify by hand calculation using the binomial probability formula
the probability that exactly two children inherit the disease.
- Ten percent of African-Americans are carriers for the genetic disease
sickle-cell anemia.
In a random sample of seventy-five African-Americans,
what is the probability that four or fewer
are carriers for the disease?
- Many athletes wear the Breathe-Right nasal strip
in the hope that it will improve their athletic perfomance
by allowing them to breathe easier.
A scientist tests the claim that these strips improve the body's
ability to process oxygen
by conducting an experiment
which measures the oxygen processing of an athlete
while the athlete rides an exercycle.
Each athlete is measured both with and without the nasal strip
on separate days.
If there is no effect,
one would expect that the better performance
would be equally likely to occur
with the strip or without.
In an experiment with twenty athletes,
thirteen have a better performance
while wearing the nasal strip.
What is the probability
that thirteen or more athletes
would exhibit an improvement with the nasal strip,
assuming that there is no benefit?
- The mean of the binomial distribution
is
n*p
and the standard deviation is sqrt(n*p*(1-p))
.
For a distribution with n=500
and p=0.5
,
what are the mean and standard deviation?
What is the probability that a binomial random variable
with these parameters is within one standard deviation of the mean?
- Plot the binomial distribution
with
n = 8
successively for p
ranging from 0.1 to 0.9 by 0.1.
> for(p in seq(0.1,0.9,0.1)){gbinom(8,p)}
Examine the skewness in each graph.
When p
is less than __________,
the distribution is skewed to the __________.
When p
is greater than __________,
the distribution is skewed to the __________.
When p
equals __________,
the distribution is perfectly symmetric.
- Plot the binomial distribution
with
p = 0.12
successively for n
ranging from 5 to 95 by 10
(with the scale=T).
> for(n in seq(5,95,10)){gbinom(n,0.12,scale=T)}
Examine the shape in each graph.
As the sample size increases, the skewness (increases/decreases).
Say that a probability is nonnegligible
if it is visible in a plot of the distribution.
As the sample size increases,
the absolute range of values for which the probability
is nonnegligible (increases/decreases).
As the sample size increases,
the proportion of possible values
for which the probability is nonnegligible (increases/decreases).
As the sample size increases, the general shape resembles a __________ curve.
Last modified: February 1, 2001
Bret Larget,
larget@mathcs.duq.edu