-
-
Save primaryobjects/094d24084d1045c011b7 to your computer and use it in GitHub Desktop.
library(caret) | |
library(tm) | |
# Training data. | |
data <- c('Cats like to chase mice.', 'Dogs like to eat big bones.') | |
corpus <- VCorpus(VectorSource(data)) | |
# Create a document term matrix. | |
tdm <- DocumentTermMatrix(corpus, list(removePunctuation = TRUE, stopwords = TRUE, stemming = TRUE, removeNumbers = TRUE)) | |
# Convert to a data.frame for training and assign a classification (factor) to each document. | |
train <- as.matrix(tdm) | |
train <- cbind(train, c(0, 1)) | |
colnames(train)[ncol(train)] <- 'y' | |
train <- as.data.frame(train) | |
train$y <- as.factor(train$y) | |
# Train. | |
fit <- train(y ~ ., data = train, method = 'bayesglm') | |
# Check accuracy on training. | |
predict(fit, newdata = train) | |
# Test data. | |
data2 <- c('Bats eat bugs.') | |
corpus <- VCorpus(VectorSource(data2)) | |
tdm <- DocumentTermMatrix(corpus, control = list(dictionary = Terms(tdm), removePunctuation = TRUE, stopwords = TRUE, stemming = TRUE, removeNumbers = TRUE)) | |
test <- as.matrix(tdm) | |
# Check accuracy on test. | |
predict(fit, newdata = test) |
> data | |
[1] "Cats like to chase mice." "Dogs like to eat big bones." | |
> train | |
big bone cat chase dog eat like mice y | |
1 0 0 1 1 0 0 1 1 0 | |
2 1 1 0 0 1 1 1 0 1 | |
> predict(fit, newdata = train) | |
[1] 0 1 | |
> data2 | |
[1] "Bats eat bugs." | |
> test | |
big bone cat chase dog eat like mice | |
1 0 0 0 0 0 1 0 0 | |
> predict(fit, newdata = test) | |
[1] 1 | |
> |
Nice example - just enough to cover the concepts. I did find I had to install additional packages not listed, i.e.
library('SnowballC')
library('minqa')
library('e1071')
library('caret')
library('tm')
Thanks
The 1 indicates the y value. In this case, it represents "eating". A 0 would represent "not eating".
The example above has a training set of 2 records, with the y-value indicating whether the sentence is about eating or not. So, when we run the model on the test sentence, we get a 1.
-----My Question is where do you specify in code 'eat' is a word to predict 0 or 1. If I like to add other word "like", where should I do the changes. Please explain
how did u say that y(Dependent variable) is for eating and not Eating classes??
Why can't I consider "y" has sleeping or not-Sleeping classes?
Is it depends on terms used in the Document
what if our test data have different keywords in the text , can we classify the
suppose test data = ### "dogs are mostly of brown colour"
it is showing error
dims of 'test' and 'train' differ
hello thank you very much sharing this, but I belive
predict(fit, newdata = train) should be tested on the test set rather then train? as this link suggests : https://cran.r-project.org/web/packages/caret/vignettes/caret.html
There's a problem in
Train.
fit <- train(y ~ ., data = train, method = 'bayesglm')
With this output:
Error in model.frame.default(form = y ~ ., data = train, na.action = na.fail) :
invalid type (list) for variable 'y'
Yes, there is a problem there, because there is a function called 'train.' But, in line 12, you override that function with a data matrix. You essentially destroy the function, or replace it with a data matrix.
@josvaler Be sure to copy the code as shown above. Specifically, note the type for the
y
column is afactor
. I run the above code successfully. I'm running R 3.3.3.