Skip to content

Instantly share code, notes, and snippets.

@primaryobjects
Created October 7, 2015 20:32
Show Gist options
  • Save primaryobjects/0414cf64fe38f2eca1ee to your computer and use it in GitHub Desktop.
Save primaryobjects/0414cf64fe38f2eca1ee to your computer and use it in GitHub Desktop.
Mining Massive Datasets Q4a - normalizing ratings, cosine distance, recommender systems, collaborative filtering
# Q1
# Here is a table of 1-5 star ratings for five movies (M, N, P. Q. R) by three raters (A, B, C).
# M N P Q R
# A 1 2 3 4 5
# B 2 3 2 5 3
# C 5 5 5 3 2
# Normalize the ratings by subtracting the average for each row and then subtracting the average for each column in the resulting table. Then, identify the true statement about the normalized table.
# First, setup the data.
ratings <- data.frame(M = c(1, 2, 5), N = c(2, 3, 5), P = c(3, 2, 5), Q = c(4, 5, 3), R = c(5, 3, 2))
row.names(ratings) <- c('A', 'B', 'C')
# Calculate row average.
rowmean <- apply(ratings, 1, mean)
# Create new table.
ratings2 <- ratings - rowmean
# Calculate column average.
colmean <- apply(ratings2, 2, mean)
# Create new table by subtracting column average from each value (we apply across each row and then transpose the table back to normal view).
ratings3 <- t(apply(ratings2, 1, function(x) { x - colmean }))
# Q2
# Below is a table giving the profile of three items.
# A 1 0 1 0 1 2
# B 1 1 0 0 1 6
# C 0 1 0 1 0 2
# The first five attributes are Boolean, and the last is an integer "rating." Assume that the scale factor for the rating is alpha. Compute, as a function of alpha, the cosine distances between each pair of profiles. For each of alpha = 0, 0.5, 1, and 2, determine the cosine of the angle between each pair of vectors. Which of the following is FALSE?
# First, setup the data.
data <- data.frame(a=c(1, 1, 0), b=c(0, 1, 1), c=c(1, 0, 0), d=c(0, 0, 1), e=c(1, 1, 0), f=c(2, 6, 2))
row.names(data) <- c('A', 'B', 'C')
# Next, define a cosine distance function that takes 2 inputs, x, y.
cosDist <- function(x, y) {
sum(x * y) / (sqrt(sum(x^2)) * sqrt(sum(y^2)))
}
# Package our code into a cosine distance alpha function that you pass the data.frame, column index to apply alpha, and an alpha value. Assuming 3 rows to compare.
cosDistAlpha <- function(data, colIndex, alpha) {
# Apply alpha value to target column.
data[,colIndex] <- data[,colIndex] * alpha
# Calculate cosine distance.
c(cosDist(data['A',], data['B',]),
cosDist(data['B',], data['C',]),
cosDist(data['A',], data['C',]))
}
# alpha 0
cosDistAlpha(data, 6, 0)
# alpha 0.5
cosDistAlpha(data, 6, 0.5)
# alpha 1
cosDistAlpha(data, 6, 1)
# alpha 2
cosDistAlpha(data, 6, 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment