- Entering data directly into S-PLUS.
The following data is from Exercise 2-4 on page 35
of the textbook and represents the time until death in hours
for thirteen sheep that were fed a toxic weed as part of an experiment.
44 27 24 24 36 36 44 44 120 29 36 36 36
Follow these steps to create a variable named deathTime with this data.
- Open a Commands Window by clicking on the button with the ">x" symbol
if you do not already have one open.
[How?]
- In the Commands Window, create a variable called deathTime
using the
scan
function.
You can put spaces or single carriage returns between numbers.
A carriage return on a blank line ends the input.
See the example below. (Some of the characters below are computer output.)
> deathTime <- scan()
> 1: 44 27 24 24 36
> 6: 36 44 44 120 29
> 11: 36 36 36
>
- Calculating means and medians in the Commands Window.
You can calculate the mean and median.
> mean(deathTime)
[1] 41.23077
> median(deathTime)
[1] 36
- Creating a file and reading the file into S-PLUS.
Exercise 2-5 on page 35 contains ten columns of ten cholinesterase indices.
Assume that the first column are measurements from men and that the second column
are measurements from women.
Ignore the final eight columns of data.
You can enter this data with two variables into a file to read into S-PLUS
following these steps.
- Click on the Start Button and select Programs:Accessories:NotePad
- Enter the data into the file including a header row with the variable names.
index sex
2.29 male
2.67 male
3.09 male
.
.
.
1.82 male
1.95 female
1.75 female
.
.
.
1.06 female
- Save this file to the Desktop naming the file
chol
by selecting Save As... from the File menu.
- Import the data into S-PLUS by selecting Import Data from the File menu.
[How?]
- Calculating Means and Medians in the Commands Window.
To refer to variables in a data set by name, you need to attach the data set.
> attach(chol)
You can find the mean of all the index measurements.
> mean(index)
[1] 1.91
You can find the mean of the index measurements separately
for males and females.
> mean(index[sex=="male"])
[1] 2.239
> mean(index[sex=="female"])
[1] 1.581
The square brackets select the subset of the index variable
for which the logical statement inside is true.
- Read in data from the Web page.
Find the HARVEST data set on the course Web page and save it to the Desktop.
Import the data into S-PLUS.
- Using S-PLUS to draw a histogram.
A histogram is a bar graph
for displaying the distribution
of a single quantitative variable.
Make a histogram in S-PLUS following these steps.
- Click on the ``2D Plots'' button,
which is on the ruler and has a small picture with a bar graph
and a jagged line.
This opens up the Plots2D palette.
- Click the histogram button which has a picture of a little histogram.
A graphics window and a dialog box will open.
- Click the little arrow next to ``Data Set''
and then click on the name of the data frame where your variable is.
- Click on the little arrow next to ``x Column(s)''
and then click on the name of the variable.
- Finally, click on the OK button.
Often, the default choice of the number of bars is not good.
You can follow these steps to make a better graph.
- Complete the first four steps above.
- Click on the ``Options'' tab.
- Change ``Number of bars'' from ``Auto'' to a number, such as 15.
- If the variable is integer valued, select ``Integer'' instead of ``Continuous''.
- Click on the OK button.
- Interpreting histograms.
The center of a histogram may be described in two ways.
The median is the location that divides the shaded area
of the histogram in half.
The mean is the location at which the histogram would
balance if the histogram were made from a uniform solid material.
If a histogram looks similar to its mirror image,
we say the histogram is symmetrical.
If the left half of the data is more spread than the right half of the data,
we say that the distribution is skewed to the left
Also, if the right half of the data is more spread than the left half of the data,
we say that the distribution is skewed to the right.
Make histograms of the variables SBPCB, DBPCB, and HRCB.
Which is most symmetrical?
Which is skewed to the right?
Which is skewed to the left?
- Calculating means and medians when there are missing values.
The HARVEST data set includes many missing values,
because every individual was not measured at every time point in the study,
and for some individuals, smoking or exercise information was not collected.
Missing data is represented by the code ``NA'' in S-PLUS.
If you ask S-PLUS to calculate the mean or median of a variable
that includes missing data, it gives ``NA'' as the result.
You can override this behavior with the option na.rm=T
which removes missing values before calculation.
> attach(harvest)
> mean(HRCB)
> [1] NA
> mean(HRCB,na.rm=T)
> 74.97
> median(HRCB,na.rm=T)
[1] 74.33