Skip to content

Instantly share code, notes, and snippets.

@isezen
Last active May 13, 2017 18:22
Show Gist options
  • Save isezen/253457502e618e360923f8cd98475970 to your computer and use it in GitHub Desktop.
Save isezen/253457502e618e360923f8cd98475970 to your computer and use it in GitHub Desktop.
kNN Example
library(class)
library(gmodels)
set.seed(6)
df <- data.frame(A = c(rnorm(30, 0), rnorm(30, 3)),
B = c(rnorm(30, 0), rnorm(30, 3)),
Group = factor(c(rep("G1", 30), rep("G2", 30))))
# use 33% of data for training and 67% is for test
i <- sample(2, nrow(df), replace = TRUE, prob = c(0.67, 0.33))
train.df <- df[i == 2, -3] # do not include last column
train.cl <- df[i == 2, 3] # training result cluters
test.df <- df[i == 1, -3] # test data.frame
test.real.cluster <- df[i == 1, 3] # real clusters for test
# predicted clusters by knn
test.guess.cluster <- knn(train = train.df, test = test.df, cl = train.cl, k = 3)
# convert them to muneric to colorize points on the plot
test.guess.cluster.num <- as.numeric(test.guess.cluster)
plot(test.df, col = test.guess.cluster.num, pch = test.guess.cluster.num)
# examine the result of CrossTable
# The model identified 2 G1 classes as G2 and 1 G2 class as G1.
# Hence, 3 elements are misclassified. (you can distinguish them on the plot)
gm <- gmodels::CrossTable(test.guess.cluster, test.real.cluster, prop.chisq = FALSE)
sum(diag(gm$prop.tbl)) # overall success of the model (34 - 3)/34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment