Created
April 19, 2014 05:00
-
-
Save ajp619/11074588 to your computer and use it in GitHub Desktop.
R poly()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### poly | |
How does poly work? | |
```{r} | |
a <- 1:10 | |
# Let's start easy | |
p <- poly(a, 3, raw=TRUE) | |
p | |
# This is easy to reproduce | |
data.frame('1'=a, '2'=a^2, '3'=a^3) | |
# So what about: | |
p <- poly(a, 3, raw=FALSE) # raw=FALSE is the default option | |
# Can I reproduce this? | |
# First let's define a couple of functions to make this easier | |
# vector length, like the octave norm() function | |
o.norm <- function(v){return(sqrt(sum(v*v)))} | |
# Normalize | |
v.normalize <- function(v){ | |
v <- v - mean(v) | |
v <- v / sd(v) | |
v <- v / o.norm(v) | |
return(v) | |
} | |
a1 <- v.normalize(a) | |
# If I got it right, the next line of code should produce: > [1] TRUE | |
all(round(p[ ,1], 4) == round(a1, 4)) | |
# What about the higher degrees? | |
a2 <- v.normalize(a^2) | |
all(round(p[ ,2], 4) == round(a2, 4)) | |
# That's not right | |
# Let's see what they look like: | |
plot(p[,2], pch=19) | |
points(a2, pch=19, col='blue') | |
lines(p[,2]) | |
lines(a2, col='blue') | |
# I don't know how to make sense of this | |
``` | |
I think what we're doing is creating a higher order polynomial by adding features (or columns) to the data set and then fitting a linear model to the new data set, so the raw=TRUE format makes sense to me. | |
Reading through ?poly and other resources, the raw=FALSE option creates orthagonal polynomials. This is to reduce multicollinearity: | |
From wikipedia: http://en.wikipedia.org/wiki/Multicollinearity | |
Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. | |
But looking at the graph, it feels like we're just making up data points. ?????? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment