Attach the data set.
> attach(television)in the Commands Window.
> plot(tv,life)Is the association positive or negative? Is this what you expected?
Does it look like a linear relationship is adequate, or is a nonlinear relationship better?
If a linear relationship is inadequate, try both a reciprocal and a log transformation to see which is better. The reciprocal would be televisions per person.
> plot(1/tv,life)
> plot(log(tv),life)
Do the same for the physician variable.
> plot(phys,life)Is there a negative or positive association?
> plot(1/phys,life)
> plot(log(phys),life)Which transformation makes the relationship with life expectancy most linear?
life ~ log(tv) + log(phys)This means ``life expectancy in years is modeled as a a linear function of log(tv) and log(phys)''. An intercept is included by default.
Examine the residual plot. Do you see much of a pattern?
In the Report Window, there will be a table labeled "Coefficients" with the fitted parameter values.
Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 90.6222 4.3557 20.8056 0.0000 log(phys) -2.2589 0.7474 -3.0221 0.0047 log(tv) -2.9156 0.5907 -4.9358 0.0000
The column headed "Value" has the slope and intercept of the regression line. These are statistics that can be used to describe the relationship between these variables.
The column headed "Std. Error" has the estimated standard errors of the estimated coefficients.
The column headed "t value" is the t statistic of the hypothesis test that tests if the true parameter value is 0.
The column headed "Pr(>|t|)" is the two-sided p-value of the hypothesis test.
Are both variables useful for making predictions on life expectancy?
Notice that television has a larger (absolute) t value and a smaller p-value.
Comment on the following conclusion.
Our model is |
Is this conclusion justified?
Bret Larget, larget@mathcs.duq.edu