Last active
March 29, 2018 15:33
-
-
Save davebraze/d9b783c07b02eb177d148ca62e04147b to your computer and use it in GitHub Desktop.
Assessing Multi-collinearity
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(car) | |
library(Hmisc) | |
library(perturb) | |
n <- 200 | |
x0 <- rnorm(n) | |
x1 <- rnorm(n) | |
x2 <- x1 + rnorm(n)/2 | |
x3 <- x2 + rnorm(n)/1.25 | |
x4 <- x3 + rnorm(n) | |
x5 <- x4 + rnorm(n)/.8 | |
x6 <- x5 + rnorm(n)/.25 | |
x7 <- x1 + rnorm(n, m=0, sd=.2) | |
m <- cbind(x0,x1,x2,x3,x4,x5,x6,x7) | |
Hmisc::rcorr(m) | |
## Note that in this example, x0, is a throw-away variable. It does | |
## not enter into the calculation of vif or condition number. (DVs, | |
## generally, do not). | |
m.lm <- lm(x0~., data=data.frame(m)) | |
car::vif(m.lm) # variance inflation factors | |
## The VIF is probably the most common means to test for the presence of | |
## multicollinearity (MCL). But a key problem with VIFs is that they do not | |
## afford easy diagnosis of the sources of MCL for a given variable. An | |
## alternative is to use the condition index (kappa) together with variance | |
## decomposition proportions for the purpose (Belsley, 1991). | |
dgn <- perturb::colldiag(m.lm, add.intercept=F) # condition indices | |
## The result of colldiag() is a list of 2 elements, a vector of condition | |
## numbers and a matrix of variance decomposition proportions. | |
dgn$condindx | |
dgn$pi | |
## each column of pi shows the distribution of variance proportion for the | |
## named variable. Therefore, each column of pi will sum to 1. | |
apply(dgn$pi, 2, sum) | |
## Generally speaking, if MCL is a concern, you should | |
## standardize your variables, or at least center them. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Should
x0
be included inm
? It doesn't affect the VIFs, but there will be one additional condition index ifx0
is included (although the values of the rest of the condition indices won't differ).