Last active
December 18, 2015 11:19
-
-
Save dylanjf/5774446 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
amazon_test$ACTION1 = 1 | |
#this is really just a placeholder variable so that the training examples wont be removed when I call match. | |
#also, make sure you put this in the same column as it is in the training set... | |
amazon_test = sparse.model.matrix(~. - 1, data = amazon_test) | |
#turn the test set into a sparse matrix. the training is in this format already. | |
amazon_train = amazon_train[, c(which(complete.cases(match(colnames(amazon_train), colnames(amazon_test))) == TRUE))] | |
#to break down what this does, match takes the second argument and returns the numerical position of where it | |
#exists in the first argument. If it doesn't find it, it returns NA. so I'm telling it to look through a | |
#character vector of column names in the test file, relate them to what columns they correspond to in | |
#training, and only keep those. The complete.cases wrapper removes all the NAs from the vector and gives | |
#me a nice numerical list. | |
amazon_test = amazon_test[, c(which(complete.cases(match(colnames(amazon_test), colnames(amazon_train))) == TRUE))] | |
#not only that, you need to run it again to take away the dummies that are in the test set but not in training | |
dim(amazon_train) ; dim(amazon_test) | |
#should have the same number of columns now |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment