-
-
Save CerebralMastication/582767 to your computer and use it in GitHub Desktop.
set.seed(2) | |
x <- 1:100 | |
y <- 20 + 3 * x | |
e <- rnorm(100, 0, 60) | |
y <- 20 + 3 * x + e | |
plot(x,y) | |
yx.lm <- lm(y ~ x) | |
lines(x, predict(yx.lm), col="red") | |
xy.lm <- lm(x ~ y) | |
lines(predict(xy.lm), y, col="blue") | |
# so lm() depends on which variable is x and wich is y | |
# lm minimizes y distance (the error term is y-yhat) | |
#normalize means and cbind together | |
xyNorm <- cbind(x=x-mean(x), y=y-mean(y)) | |
plot(xyNorm) | |
#covariance | |
xyCov <- cov(xyNorm) | |
eigenValues <- eigen(xyCov)$values | |
eigenVectors <- eigen(xyCov)$vectors | |
eigenValues | |
eigenVectors | |
plot(xyNorm, ylim=c(-200,200), xlim=c(-200,200)) | |
lines(xyNorm[x], eigenVectors[2,1]/eigenVectors[1,1] * xyNorm[x]) | |
lines(xyNorm[x], eigenVectors[2,2]/eigenVectors[1,2] * xyNorm[x]) | |
# the largest eigenValue is the first one | |
# so that's our principal component. | |
# but the principal component is in normalized terms (mean=0) | |
# and we want it back in real terms like our starting data | |
# so let's denormalize it | |
plot(x,y) | |
lines(x, (eigenVectors[2,1]/eigenVectors[1,1] * xyNorm[x]) + mean(y)) | |
# that looks right. line through the middle as expected | |
# what if we bring back our other two regressions? | |
lines(x, predict(yx.lm), col="red") | |
lines(predict(xy.lm), y, col="blue") |
It's in the first 13 lines. Plot the lines() one at a time to recreate.
I meant the code for generating:
OLS1.png
OLS2.png
pca.png
Many thanks in advance,
Ruben
Ahh. I just had a revelation in what you were asking. The yellow lines in all 3 of those were drawn by hand. I did the base plots (which I think you can see from the code) and then I drew 2 yellow lines on each one for illustration.
Does that answer your question? Can you pick out of the code where the base plots are?
plot(x,y)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col="red") # <- that's OLS1
plot(x,y)
xy.lm <- lm(x ~ y)
lines(predict(xy.lm), y, col="blue") # <- that's OLS2
Nice trick ha,ha. Very illustrative!
Now I'm curious. Would it be possible to do something like that with R?
Let's ask the hive mind:
http://stackoverflow.com/questions/3737165/drop-lines-from-actual-to-modeled-points-in-r
-JD
And Josh Ulrich provided an answer in under 20 minutes. Hive mind FTW!
-JD
Thanks a lot!
I really enjoyed the blog post and thanks for sharing the code. I have been trying to figure out how to generate the orange (drop) lines on this plot(and below). I tried the segments() function as suggested by Josh Ulrich's posts...but keep going in circles. Any help is appreciated.
Hi there,
this is a great post. I have always wondered about the intuition behind PCA...
I'm curious about two of your graphs posted in the blog illustrating the difference between errors in y
x and xy.Would it be possible to have access to the R code to generate them?
Many thanks in advance,
Ruben