Skip to content

Instantly share code, notes, and snippets.

@primaryobjects
Last active March 28, 2018 08:39
Show Gist options
  • Select an option

  • Save primaryobjects/a5eaa6dc7e1bf4898c2d to your computer and use it in GitHub Desktop.

Select an option

Save primaryobjects/a5eaa6dc7e1bf4898c2d to your computer and use it in GitHub Desktop.
Wine Quality Dataset Prediction Analysis using R and caret
packages <- c('caret')
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
library(caret)
# Download dataset, if it does not exist.
fileName <- 'winequality-red.csv';
if (!file.exists(fileName)) {
download.file(paste0('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/', fileName), fileName, method="curl")
}
fileName <- 'winequality-white.csv';
if (!file.exists(fileName)) {
download.file(paste0('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/', fileName), fileName, method="curl")
}
data <- read.csv('winequality-white.csv', sep=';')
data <- rbind(data, read.csv('winequality-red.csv', sep=';'))
partition <- createDataPartition(data$quality, p = 0.75)[[1]]
train <- data[partition,]
test <- data[-partition,]
# 0.5172 accuracy
#fit <- train(quality ~ ., data = train, method = 'plsRglm')
# 0.5172
#fit <- train(quality ~ ., data = train, method = 'plsRglm', preProcess = c("center", "scale"))
# 0.5209
fit <- train(quality ~ alcohol, data = train, method = 'plsRglm')
results <- round(predict(fit, newdata=test))
confusionMatrix(results, test$quality)
Reference
Prediction 3 4 5 6 7 8 9
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 1 14 272 169 22 7 0
6 9 32 266 511 212 32 1
7 0 0 2 29 32 12 1
8 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0
Overall Statistics
Accuracy : 0.5018
95% CI : (0.4772, 0.5265)
No Information Rate : 0.4366
P-Value [Acc > NIR] : 7.299e-08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment