Last active
March 28, 2018 08:39
-
-
Save primaryobjects/a5eaa6dc7e1bf4898c2d to your computer and use it in GitHub Desktop.
Wine Quality Dataset Prediction Analysis using R and caret
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| packages <- c('caret') | |
| if (length(setdiff(packages, rownames(installed.packages()))) > 0) { | |
| install.packages(setdiff(packages, rownames(installed.packages()))) | |
| } | |
| library(caret) | |
| # Download dataset, if it does not exist. | |
| fileName <- 'winequality-red.csv'; | |
| if (!file.exists(fileName)) { | |
| download.file(paste0('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/', fileName), fileName, method="curl") | |
| } | |
| fileName <- 'winequality-white.csv'; | |
| if (!file.exists(fileName)) { | |
| download.file(paste0('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/', fileName), fileName, method="curl") | |
| } | |
| data <- read.csv('winequality-white.csv', sep=';') | |
| data <- rbind(data, read.csv('winequality-red.csv', sep=';')) | |
| partition <- createDataPartition(data$quality, p = 0.75)[[1]] | |
| train <- data[partition,] | |
| test <- data[-partition,] | |
| # 0.5172 accuracy | |
| #fit <- train(quality ~ ., data = train, method = 'plsRglm') | |
| # 0.5172 | |
| #fit <- train(quality ~ ., data = train, method = 'plsRglm', preProcess = c("center", "scale")) | |
| # 0.5209 | |
| fit <- train(quality ~ alcohol, data = train, method = 'plsRglm') | |
| results <- round(predict(fit, newdata=test)) | |
| confusionMatrix(results, test$quality) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Reference | |
| Prediction 3 4 5 6 7 8 9 | |
| 3 0 0 0 0 0 0 0 | |
| 4 0 0 0 0 0 0 0 | |
| 5 1 14 272 169 22 7 0 | |
| 6 9 32 266 511 212 32 1 | |
| 7 0 0 2 29 32 12 1 | |
| 8 0 0 0 0 0 0 0 | |
| 9 0 0 0 0 0 0 0 | |
| Overall Statistics | |
| Accuracy : 0.5018 | |
| 95% CI : (0.4772, 0.5265) | |
| No Information Rate : 0.4366 | |
| P-Value [Acc > NIR] : 7.299e-08 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment