DataCamp R Course

Regression Diagnostics

An excellent review of regression diagnostics is provided in John Fox's aptly named Overview of Regression Diagnostics. Dr. Fox's car package provides advanced utilities for regression modeling.

# Assume that we are fitting a multiple linear regression
# on the MTCARS data
fit <- lm(mpg~disp+hp+wt+drat, data=mtcars)

This example is for exposition only. We will ignore the fact that this may not be a great way of modeling the this particular set of data!


# Assessing Outliers
outlierTest(fit) # Bonferonni p-value for most extreme obs
qqPlot(fit, main="QQ Plot") #qq plot for studentized resid
leveragePlots(fit) # leverage plots

leverage plot click to view

Influential Observations

# Influential Observations
# added variable plots
# Cook's D plot
# identify D values > 4/(n-k-1)
cutoff <- 4/((nrow(mtcars)-length(fit$coefficients)-2))
plot(fit, which=4, cook.levels=cutoff)
# Influence Plot
influencePlot(fit, id.method="identify", main="Influence Plot", sub="Circle size is proportial to Cook's Distance" )

av plots Cook's D plot influence plot click to view


# Normality of Residuals
# qq plot for studentized resid
qqPlot(fit, main="QQ Plot")
# distribution of studentized residuals
sresid <- studres(fit)
hist(sresid, freq=FALSE,
   main="Distribution of Studentized Residuals")
lines(xfit, yfit)

qq plot histogram of studentized residuals click to view

Non-constant Error Variance

# Evaluate homoscedasticity
# non-constant error variance test
# plot studentized residuals vs. fitted values

spread vs. levels click to view


# Evaluate Collinearity
vif(fit) # variance inflation factors
sqrt(vif(fit)) > 2 # problem?


# Evaluate Nonlinearity
# component + residual plot
# Ceres plots

component plus residual plot Ceres plots click to view

Non-independence of Errors

# Test for Autocorrelated Errors

Additional Diagnostic Help

The gvlma( ) function in the gvlma package, performs a global validation of linear model assumptions as well separate evaluations of skewness, kurtosis, and heteroscedasticity.

# Global test of model assumptions
gvmodel <- gvlma(fit)

Going Further

If you would like to delve deeper into regression diagnostics, two books written by John Fox can help: Applied regression analysis and generalized linear models (2nd ed) and An R and S-Plus companion to applied regression.

To Practice

Work with regressions in R with this interactive course and correlation and regressions with R.