Skip to content

Instantly share code, notes, and snippets.

@primaryobjects
Last active March 16, 2020 22:11
Show Gist options
  • Save primaryobjects/614958b8674fbf1b92a2 to your computer and use it in GitHub Desktop.
Save primaryobjects/614958b8674fbf1b92a2 to your computer and use it in GitHub Desktop.
Mining Massive Datasets Quiz 1
# Q1
#
# Suppose we compute PageRank with a β of 0.7, and we introduce the additional constraint that the sum of the PageRanks of the three pages must be 3, to handle the problem that otherwise any multiple of a solution will also be a solution. Compute the PageRanks a, b, and c of the three pages A, B, and C, respectively. Then, identify from the list below, the true statement.
#
# Matrix
#
# A B C
# A 0 0 0
# B 0.5 0 0
# C 0.5 1 1
#
# Looking at columns.
# For node A, the user has 0 probability of moving to A, 0.5 of moving to B, 0.5 of moving to C.
#
b = 0.7
M = matrix(c(0, 0.5, 0.5, 0, 0, 1, 0, 0, 1), ncol=3)
e = matrix(c(1, 1, 1), ncol=1)
v1 = matrix(c(1, 1, 1), ncol=1)
v1 = v1 / 3
for (i in 1:5) {
v1 = ((b * M ) %*% v1 ) + (((1 - b ) * e ) / 3)
}
v1 = v1 * 3
# Find values for a, b, c to match equations in possible answers.
a <- v1[1]
b <- v1[2]
c <- v1[3]
# Q2
#
# Suppose we compute PageRank with β=0.85. Write the equations for the PageRanks a, b, and c of the three pages A, B, and C, respectively. Then, identify in the list below, one of the equations.
#
# Matrix
#
# A B C
# A 0 0 1
# B 0.5 0 0
# C 0.5 1 0
#
b = 0.85
M = matrix(c(0, 0.5, 0.5, 0, 0, 1, 1, 0, 0), ncol=3)
e = matrix(c(1, 1, 1), ncol=1)
v1 = matrix(c(1, 1, 1), ncol=1)
v1 = v1 / 3
for (i in 1:4) {
v1 = ((b * M ) %*% v1 ) + (((1 - b ) * e ) / 3)
}
v1 = v1 * 3
# Find values for a, b, c to match equations in possible answers.
a <- v1[1]
b <- v1[2]
c <- v1[3]
# Determine which equation is true.
(85 * b) == (.575 * a) + (.15 * c)
(.95 * a) == (.9 * c) + (.05 * b)
b == (.475 * a) + (.05 * c)
c == b + (.575 * a)
# Q3
#
# Assuming no "taxation," compute the PageRanks a, b, and c of the three pages A, B, and C, using iteration, starting with the "0th" iteration where all three pages have rank a = b = c = 1. Compute as far as the 5th iteration, and also determine what the PageRanks are in the limit. Then, identify the true statement from the list below.
#
# We re-use the same matrix and calculations from Q2.
round(b, 3) == 5/8
round(a, 1) == 6/5
round(c, 3) == 11/8
#
# Q4
#
# Suppose our input data to a map-reduce operation consists of integer values (the keys are not important). The map function takes an integer i and produces the list of pairs (p,i) such that p is a prime divisor of i. For example, map(12) = [(2,12), (3,12)].
# The reduce function is addition. That is, reduce(p, [i1, i2, ...,ik]) is (p,i1+i2+...+ik).
# Compute the output, if the input is the set of integers 15, 21, 24, 30, 49. Then, identify, in the list below, one of the pairs in the output.
#
# See https://gist.github.com/primaryobjects/8755398629a4e2ef74dd
#
# map(15) = [(3, 15), (5, 15)]
# map(21) = [(3, 21), (7, 21)]
# map(24) = [(2, 24), (3, 24)]
# map(30) = [(2, 30), (3, 30), (5, 30)]
# map(49) = [(7, 49)]
#
# reduce(2, 54)
# reduce(3, 90)
# reduce(5, 45)
# reduce(7, 70)
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment