mtcars
We load the dataset
data(mtcars)
head(mtcars)
Suppose we consider a linear regression of mpg
against
disp
. We first check the scatter plot.
plot(mtcars$disp, mtcars$mpg)
It shows a nonlinear trend. So we create two new columns by taking the log.
mtcars['logmpg'] = log(mtcars$mpg)
mtcars['logdisp'] = log(mtcars$disp)
plot(mtcars$logdisp, mtcars$logmpg)
Now the trend is linear. We run a linear regression model.
fit = lm(logmpg ~ logdisp, data=mtcars)
print(summary(fit))
##
## Call:
## lm(formula = logmpg ~ logdisp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22758 -0.08874 -0.00791 0.07970 0.32143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.38097 0.20803 25.87 < 2e-16 ***
## logdisp -0.45857 0.03913 -11.72 1.01e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1282 on 30 degrees of freedom
## Multiple R-squared: 0.8207, Adjusted R-squared: 0.8148
## F-statistic: 137.3 on 1 and 30 DF, p-value: 1.006e-12
Both intercept and the slope are significant. \(R^2=0.8148\). The fitted model is \[logmpg = 5.38 - 0.459\times logdisp + \epsilon\quad\text{with}\ \epsilon\sim N(0, 0.1282^2).\] The residual plot is checked.
plot(fit$fitted.values, fit$residuals)
The residual plot is not too bad, though some unequal variances.
Then we check the QQ-plot.
qqnorm(fit$residuals)
qqline(fit$residuals)
The QQ-plot indicates the residuals are normal.
Or we can call plot
directly.
plot(fit)
Suppose there is a new car with disp
=300. The prediction
is
pred = predict(fit, newdata = data.frame(logdisp=log(300)))
print(pred)
## 1
## 2.765399
The predicted value for the logmpg
is 2.765.
The confidence interval for the mean response is
ci = predict(fit, newdata = data.frame(logdisp=log(300)), interval="confidence")
print(ci)
## fit lwr upr
## 1 2.765399 2.708287 2.822511
The CI for the mean response of logmpg
is (2.708,
2.823).
The prediction interval is
pi = predict(fit, newdata = data.frame(logdisp=log(300)), interval="prediction")
print(pi)
## fit lwr upr
## 1 2.765399 2.497496 3.033301
The PI for logmpg
is (2.497, 3.033).
We can print the CI band and PI band.
newx = seq(4, 6.5, 0.1)
ci.band = predict(fit, newdata = data.frame(logdisp=newx), interval="confidence")
pi.band = predict(fit, newdata = data.frame(logdisp=newx), interval="prediction")
plot(mtcars$logdisp, mtcars$logmpg)
lines(newx, ci.band[,'fit'], type='l')
lines(newx, ci.band[,'upr'], type='l', lty='dashed')
lines(newx, ci.band[,'lwr'], type='l', lty='dashed')
lines(newx, pi.band[,'upr'], type='l', lty='dotted')
lines(newx, pi.band[,'lwr'], type='l', lty='dotted')
legend(5.6, 3.5,lty=c("solid", "dashed", "dotted"), legend=c("fitted", "CI", "PI"))
We can also compare the pointwise CI v.s. simultaneous CI.
n = dim(mtcars)[1]
Sxx = var(mtcars$logdisp) * (n-1)
xbar = mean(mtcars$logdisp)
sci.lwr = ci.band[,'fit'] - sqrt(2*qf(0.95,2,fit$df.residual)) * summary(fit)$sigma * sqrt(1/n + (newx - xbar) ** 2 /Sxx)
sci.upr = ci.band[,'fit'] + sqrt(2*qf(0.95,2,fit$df.residual)) * summary(fit)$sigma * sqrt(1/n + (newx - xbar) ** 2 /Sxx)
plot(mtcars$logdisp, mtcars$logmpg)
lines(newx, ci.band[,'fit'], type='l')
lines(newx, ci.band[,'upr'], type='l', lty='dashed')
lines(newx, ci.band[,'lwr'], type='l', lty='dashed')
lines(newx, sci.upr, type='l', lty='dotted')
lines(newx, sci.lwr, type='l', lty='dotted')
legend(5.6, 3.5,lty=c("solid", "dashed", "dotted"), legend=c("fitted", "Pointwise CI", "Simutaneous CI"))