Skip to content

Instantly share code, notes, and snippets.

@davebraze
Last active March 29, 2018 15:33
Show Gist options
  • Save davebraze/d9b783c07b02eb177d148ca62e04147b to your computer and use it in GitHub Desktop.
Save davebraze/d9b783c07b02eb177d148ca62e04147b to your computer and use it in GitHub Desktop.
Assessing Multi-collinearity
library(car)
library(Hmisc)
library(perturb)
n <- 200
x0 <- rnorm(n)
x1 <- rnorm(n)
x2 <- x1 + rnorm(n)/2
x3 <- x2 + rnorm(n)/1.25
x4 <- x3 + rnorm(n)
x5 <- x4 + rnorm(n)/.8
x6 <- x5 + rnorm(n)/.25
x7 <- x1 + rnorm(n, m=0, sd=.2)
m <- cbind(x0,x1,x2,x3,x4,x5,x6,x7)
Hmisc::rcorr(m)
## Note that in this example, x0, is a throw-away variable. It does
## not enter into the calculation of vif or condition number. (DVs,
## generally, do not).
m.lm <- lm(x0~., data=data.frame(m))
car::vif(m.lm) # variance inflation factors
## The VIF is probably the most common means to test for the presence of
## multicollinearity (MCL). But a key problem with VIFs is that they do not
## afford easy diagnosis of the sources of MCL for a given variable. An
## alternative is to use the condition index (kappa) together with variance
## decomposition proportions for the purpose (Belsley, 1991).
dgn <- perturb::colldiag(m.lm, add.intercept=F) # condition indices
## The result of colldiag() is a list of 2 elements, a vector of condition
## numbers and a matrix of variance decomposition proportions.
dgn$condindx
dgn$pi
## each column of pi shows the distribution of variance proportion for the
## named variable. Therefore, each column of pi will sum to 1.
apply(dgn$pi, 2, sum)
## Generally speaking, if MCL is a concern, you should
## standardize your variables, or at least center them.
@monicaycli
Copy link

Should x0 be included in m? It doesn't affect the VIFs, but there will be one additional condition index if x0 is included (although the values of the rest of the condition indices won't differ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment