The k-nearest neighbors (k-NN) algorithm is among the simplest algorithms in the data mining field. Distances / similarities are calculated between each element in the data set using some distance / similarity metric ^[1]^ that the researcher chooses (there are many distance / similarity metrics), where the distance / similarity between any two elements is calculated based on the two elements' attributes. A data element’s k-NN are the k closest data elements according to this distance / similarity.
In this Graph Gist, I’m using k-NN with cosine similarity ^[2]^ as the similarity metric to calculate movie recommendations. I wanted a fun data set, so I asked people on Twitter and bothered a few people via email to fill out this form. Using their movie ratings, I’ll calculate the cosine similarity between each person. I’ll then calculate movie recommendations for a person’s unrated movies based on an average rating from that person’s k-nearest neighbors. My methodology will be explained in detail throughout.
See my blog post if you would like to see how I implemented this methodology in R.
For now, the database only consists of Person nodes and Movie nodes, where (Person)-[:RATED {rating:00}]->(Movie). There are 15 people and 30 movies in the dataset.
See Alistair Jones's arrow tool.
CREATE
(Person1:Person {name:'Michael Sherman'}),
(Person2:Person {name:'Zoltan Varju'}),
(Person3:Person {name:'Peter Neubauer'}),
(Person4:Person {name:'Grace Andrews'}),
(Person5:Person {name:'Michael Hunger'}),
(Person6:Person {name:'Toby Craig'}),
(Person7:Person {name:'Huston Hedinger'}),
(Person8:Person {name:'Nigel Small'}),
(Person9:Person {name:'Wes Freeman'}),
(Person10:Person {name:'Luanne Misquitta'}),
(Person11:Person {name:'shiv swami'}),
(Person12:Person {name:'Pernilla Lindh'}),
(Person13:Person {name:'Max De Marzi'}),
(Person14:Person {name:'Chris Leishman'}),
(Person15:Person {name:'Kenny Bastani'}),
(Movie1:Movie {name:'Titanic'}),
(Movie2:Movie {name:'Forrest Gump'}),
(Movie3:Movie {name:'Mean Girls'}),
(Movie4:Movie {name:'The Bourne Trilogy'}),
(Movie5:Movie {name:'Jurassic Park'}),
(Movie6:Movie {name:'The 40 Year Old Virgin'}),
(Movie7:Movie {name:'Thank You for Smoking'}),
(Movie8:Movie {name:'Happy Gilmore'}),
(Movie9:Movie {name:'Knocked Up'}),
(Movie10:Movie {name:'A Beautiful Mind'}),
(Movie11:Movie {name:'Bridesmaids'}),
(Movie12:Movie {name:'The Dark Knight Trilogy'}),
(Movie13:Movie {name:'Charlie\'s Angels'}),
(Movie14:Movie {name:'Avatar'}),
(Movie15:Movie {name:'Children of Men'}),
(Movie16:Movie {name:'Gladiator'}),
(Movie17:Movie {name:'Shutter Island'}),
(Movie18:Movie {name:'Forgetting Sarah Marshall'}),
(Movie19:Movie {name:'Inception'}),
(Movie20:Movie {name:'The Social Network'}),
(Movie21:Movie {name:'Marley and Me'}),
(Movie22:Movie {name:'Taken'}),
(Movie23:Movie {name:'Pan\'s Labyrinth'}),
(Movie24:Movie {name:'Inglourious Basterds'}),
(Movie25:Movie {name:'The Ocean\'s Trilogy'}),
(Movie26:Movie {name:'The Notebook'}),
(Movie27:Movie {name:'The Devil Wears Prada'}),
(Movie28:Movie {name:'The Truman Show'}),
(Movie29:Movie {name:'WALL-E'}),
(Movie30:Movie {name:'Paranormal Activity'}),
(Person1)-[:RATED {rating:4}]->(Movie1),
(Person1)-[:RATED {rating:3}]->(Movie2),
(Person1)-[:RATED {rating:8}]->(Movie3),
(Person1)-[:RATED {rating:8}]->(Movie5),
(Person1)-[:RATED {rating:6}]->(Movie6),
(Person1)-[:RATED {rating:6}]->(Movie8),
(Person1)-[:RATED {rating:4}]->(Movie9),
(Person1)-[:RATED {rating:7}]->(Movie12),
(Person1)-[:RATED {rating:5}]->(Movie14),
(Person1)-[:RATED {rating:2}]->(Movie16),
(Person1)-[:RATED {rating:9}]->(Movie20),
(Person1)-[:RATED {rating:4}]->(Movie27),
(Person1)-[:RATED {rating:9}]->(Movie28),
(Person1)-[:RATED {rating:10}]->(Movie29),
(Person2)-[:RATED {rating:3}]->(Movie1),
(Person2)-[:RATED {rating:8}]->(Movie2),
(Person2)-[:RATED {rating:7}]->(Movie5),
(Person2)-[:RATED {rating:5}]->(Movie6),
(Person2)-[:RATED {rating:8}]->(Movie7),
(Person2)-[:RATED {rating:6}]->(Movie9),
(Person2)-[:RATED {rating:8}]->(Movie10),
(Person2)-[:RATED {rating:7}]->(Movie13),
(Person2)-[:RATED {rating:5}]->(Movie16),
(Person2)-[:RATED {rating:9}]->(Movie24),
(Person2)-[:RATED {rating:9}]->(Movie25),
(Person2)-[:RATED {rating:7}]->(Movie27),
(Person2)-[:RATED {rating:2}]->(Movie28),
(Person3)-[:RATED {rating:8}]->(Movie1),
(Person3)-[:RATED {rating:10}]->(Movie2),
(Person3)-[:RATED {rating:7}]->(Movie3),
(Person3)-[:RATED {rating:9}]->(Movie4),
(Person3)-[:RATED {rating:8}]->(Movie5),
(Person3)-[:RATED {rating:5}]->(Movie6),
(Person3)-[:RATED {rating:8}]->(Movie7),
(Person3)-[:RATED {rating:7}]->(Movie8),
(Person3)-[:RATED {rating:7}]->(Movie9),
(Person3)-[:RATED {rating:9}]->(Movie10),
(Person3)-[:RATED {rating:5}]->(Movie11),
(Person3)-[:RATED {rating:7}]->(Movie12),
(Person3)-[:RATED {rating:3}]->(Movie13),
(Person3)-[:RATED {rating:10}]->(Movie14),
(Person3)-[:RATED {rating:7}]->(Movie15),
(Person3)-[:RATED {rating:9}]->(Movie16),
(Person3)-[:RATED {rating:7}]->(Movie17),
(Person3)-[:RATED {rating:5}]->(Movie18),
(Person3)-[:RATED {rating:8}]->(Movie19),
(Person3)-[:RATED {rating:3}]->(Movie20),
(Person3)-[:RATED {rating:4}]->(Movie21),
(Person3)-[:RATED {rating:6}]->(Movie22),
(Person3)-[:RATED {rating:9}]->(Movie23),
(Person3)-[:RATED {rating:8}]->(Movie24),
(Person3)-[:RATED {rating:7}]->(Movie25),
(Person3)-[:RATED {rating:5}]->(Movie26),
(Person3)-[:RATED {rating:2}]->(Movie27),
(Person3)-[:RATED {rating:4}]->(Movie28),
(Person3)-[:RATED {rating:8}]->(Movie29),
(Person3)-[:RATED {rating:5}]->(Movie30),
(Person4)-[:RATED {rating:8}]->(Movie1),
(Person4)-[:RATED {rating:9}]->(Movie2),
(Person4)-[:RATED {rating:7}]->(Movie3),
(Person4)-[:RATED {rating:9}]->(Movie4),
(Person4)-[:RATED {rating:8}]->(Movie5),
(Person4)-[:RATED {rating:6}]->(Movie6),
(Person4)-[:RATED {rating:7}]->(Movie8),
(Person4)-[:RATED {rating:6}]->(Movie9),
(Person4)-[:RATED {rating:8}]->(Movie10),
(Person4)-[:RATED {rating:8}]->(Movie11),
(Person4)-[:RATED {rating:9}]->(Movie12),
(Person4)-[:RATED {rating:7}]->(Movie13),
(Person4)-[:RATED {rating:7}]->(Movie14),
(Person4)-[:RATED {rating:10}]->(Movie15),
(Person4)-[:RATED {rating:9}]->(Movie16),
(Person4)-[:RATED {rating:9}]->(Movie17),
(Person4)-[:RATED {rating:5}]->(Movie18),
(Person4)-[:RATED {rating:10}]->(Movie19),
(Person4)-[:RATED {rating:7}]->(Movie20),
(Person4)-[:RATED {rating:7}]->(Movie21),
(Person4)-[:RATED {rating:6}]->(Movie26),
(Person4)-[:RATED {rating:9}]->(Movie27),
(Person4)-[:RATED {rating:7}]->(Movie28),
(Person5)-[:RATED {rating:10}]->(Movie2),
(Person5)-[:RATED {rating:8}]->(Movie5),
(Person5)-[:RATED {rating:6}]->(Movie12),
(Person5)-[:RATED {rating:10}]->(Movie13),
(Person5)-[:RATED {rating:6}]->(Movie14),
(Person5)-[:RATED {rating:4}]->(Movie16),
(Person5)-[:RATED {rating:8}]->(Movie19),
(Person5)-[:RATED {rating:5}]->(Movie20),
(Person5)-[:RATED {rating:7}]->(Movie25),
(Person6)-[:RATED {rating:7}]->(Movie1),
(Person6)-[:RATED {rating:8}]->(Movie2),
(Person6)-[:RATED {rating:7}]->(Movie4),
(Person6)-[:RATED {rating:8}]->(Movie5),
(Person6)-[:RATED {rating:9}]->(Movie16),
(Person6)-[:RATED {rating:10}]->(Movie23),
(Person6)-[:RATED {rating:8}]->(Movie24),
(Person6)-[:RATED {rating:6}]->(Movie25),
(Person6)-[:RATED {rating:8}]->(Movie28),
(Person6)-[:RATED {rating:8}]->(Movie29),
(Person6)-[:RATED {rating:9}]->(Movie30),
(Person7)-[:RATED {rating:8}]->(Movie1),
(Person7)-[:RATED {rating:9}]->(Movie2),
(Person7)-[:RATED {rating:4}]->(Movie3),
(Person7)-[:RATED {rating:9}]->(Movie4),
(Person7)-[:RATED {rating:9}]->(Movie5),
(Person7)-[:RATED {rating:9}]->(Movie6),
(Person7)-[:RATED {rating:9}]->(Movie7),
(Person7)-[:RATED {rating:7}]->(Movie8),
(Person7)-[:RATED {rating:7}]->(Movie9),
(Person7)-[:RATED {rating:8}]->(Movie10),
(Person7)-[:RATED {rating:9}]->(Movie11),
(Person7)-[:RATED {rating:9}]->(Movie12),
(Person7)-[:RATED {rating:5}]->(Movie13),
(Person7)-[:RATED {rating:9}]->(Movie14),
(Person7)-[:RATED {rating:6}]->(Movie15),
(Person7)-[:RATED {rating:8}]->(Movie16),
(Person7)-[:RATED {rating:8}]->(Movie18),
(Person7)-[:RATED {rating:7}]->(Movie19),
(Person7)-[:RATED {rating:7}]->(Movie20),
(Person7)-[:RATED {rating:9}]->(Movie22),
(Person7)-[:RATED {rating:8}]->(Movie25),
(Person7)-[:RATED {rating:9}]->(Movie26),
(Person7)-[:RATED {rating:6}]->(Movie28),
(Person7)-[:RATED {rating:9}]->(Movie29),
(Person8)-[:RATED {rating:5}]->(Movie1),
(Person8)-[:RATED {rating:9}]->(Movie2),
(Person8)-[:RATED {rating:10}]->(Movie4),
(Person8)-[:RATED {rating:8}]->(Movie5),
(Person8)-[:RATED {rating:10}]->(Movie12),
(Person8)-[:RATED {rating:9}]->(Movie13),
(Person8)-[:RATED {rating:7}]->(Movie14),
(Person8)-[:RATED {rating:6}]->(Movie19),
(Person8)-[:RATED {rating:10}]->(Movie22),
(Person8)-[:RATED {rating:10}]->(Movie24),
(Person8)-[:RATED {rating:9}]->(Movie25),
(Person9)-[:RATED {rating:8}]->(Movie1),
(Person9)-[:RATED {rating:8}]->(Movie2),
(Person9)-[:RATED {rating:9}]->(Movie4),
(Person9)-[:RATED {rating:9}]->(Movie5),
(Person9)-[:RATED {rating:9}]->(Movie6),
(Person9)-[:RATED {rating:8}]->(Movie9),
(Person9)-[:RATED {rating:8}]->(Movie10),
(Person9)-[:RATED {rating:7}]->(Movie13),
(Person9)-[:RATED {rating:9}]->(Movie14),
(Person9)-[:RATED {rating:7}]->(Movie19),
(Person9)-[:RATED {rating:8}]->(Movie20),
(Person9)-[:RATED {rating:8}]->(Movie21),
(Person9)-[:RATED {rating:8}]->(Movie22),
(Person9)-[:RATED {rating:7}]->(Movie23),
(Person9)-[:RATED {rating:7}]->(Movie25),
(Person9)-[:RATED {rating:7}]->(Movie27),
(Person9)-[:RATED {rating:9}]->(Movie29),
(Person10)-[:RATED {rating:3}]->(Movie1),
(Person10)-[:RATED {rating:5}]->(Movie2),
(Person10)-[:RATED {rating:5}]->(Movie3),
(Person10)-[:RATED {rating:8}]->(Movie4),
(Person10)-[:RATED {rating:7}]->(Movie5),
(Person10)-[:RATED {rating:3}]->(Movie9),
(Person10)-[:RATED {rating:9}]->(Movie10),
(Person10)-[:RATED {rating:5}]->(Movie11),
(Person10)-[:RATED {rating:7}]->(Movie12),
(Person10)-[:RATED {rating:9}]->(Movie13),
(Person10)-[:RATED {rating:10}]->(Movie14),
(Person10)-[:RATED {rating:8}]->(Movie16),
(Person10)-[:RATED {rating:8}]->(Movie20),
(Person10)-[:RATED {rating:9}]->(Movie24),
(Person10)-[:RATED {rating:9}]->(Movie25),
(Person10)-[:RATED {rating:5}]->(Movie26),
(Person10)-[:RATED {rating:9}]->(Movie27),
(Person10)-[:RATED {rating:9}]->(Movie29),
(Person11)-[:RATED {rating:10}]->(Movie1),
(Person11)-[:RATED {rating:10}]->(Movie2),
(Person11)-[:RATED {rating:5}]->(Movie3),
(Person11)-[:RATED {rating:7}]->(Movie4),
(Person11)-[:RATED {rating:9}]->(Movie5),
(Person11)-[:RATED {rating:5}]->(Movie6),
(Person11)-[:RATED {rating:5}]->(Movie7),
(Person11)-[:RATED {rating:6}]->(Movie8),
(Person11)-[:RATED {rating:7}]->(Movie9),
(Person11)-[:RATED {rating:10}]->(Movie10),
(Person11)-[:RATED {rating:7}]->(Movie11),
(Person11)-[:RATED {rating:9}]->(Movie12),
(Person11)-[:RATED {rating:8}]->(Movie13),
(Person11)-[:RATED {rating:10}]->(Movie14),
(Person11)-[:RATED {rating:7}]->(Movie15),
(Person11)-[:RATED {rating:7}]->(Movie16),
(Person11)-[:RATED {rating:7}]->(Movie17),
(Person11)-[:RATED {rating:7}]->(Movie18),
(Person11)-[:RATED {rating:6}]->(Movie19),
(Person11)-[:RATED {rating:9}]->(Movie20),
(Person11)-[:RATED {rating:7}]->(Movie21),
(Person11)-[:RATED {rating:8}]->(Movie22),
(Person11)-[:RATED {rating:7}]->(Movie23),
(Person11)-[:RATED {rating:8}]->(Movie24),
(Person11)-[:RATED {rating:7}]->(Movie25),
(Person11)-[:RATED {rating:8}]->(Movie26),
(Person11)-[:RATED {rating:7}]->(Movie27),
(Person11)-[:RATED {rating:9}]->(Movie28),
(Person11)-[:RATED {rating:9}]->(Movie29),
(Person11)-[:RATED {rating:7}]->(Movie30),
(Person12)-[:RATED {rating:5}]->(Movie1),
(Person12)-[:RATED {rating:10}]->(Movie2),
(Person12)-[:RATED {rating:8}]->(Movie3),
(Person12)-[:RATED {rating:5}]->(Movie4),
(Person12)-[:RATED {rating:10}]->(Movie5),
(Person12)-[:RATED {rating:4}]->(Movie6),
(Person12)-[:RATED {rating:10}]->(Movie7),
(Person12)-[:RATED {rating:5}]->(Movie8),
(Person12)-[:RATED {rating:5}]->(Movie9),
(Person12)-[:RATED {rating:10}]->(Movie10),
(Person12)-[:RATED {rating:5}]->(Movie11),
(Person12)-[:RATED {rating:10}]->(Movie12),
(Person12)-[:RATED {rating:5}]->(Movie13),
(Person12)-[:RATED {rating:10}]->(Movie14),
(Person12)-[:RATED {rating:9}]->(Movie15),
(Person12)-[:RATED {rating:10}]->(Movie16),
(Person12)-[:RATED {rating:7}]->(Movie17),
(Person12)-[:RATED {rating:5}]->(Movie18),
(Person12)-[:RATED {rating:5}]->(Movie19),
(Person12)-[:RATED {rating:7}]->(Movie20),
(Person12)-[:RATED {rating:10}]->(Movie21),
(Person12)-[:RATED {rating:7}]->(Movie22),
(Person12)-[:RATED {rating:10}]->(Movie23),
(Person12)-[:RATED {rating:8}]->(Movie24),
(Person12)-[:RATED {rating:8}]->(Movie25),
(Person12)-[:RATED {rating:10}]->(Movie26),
(Person12)-[:RATED {rating:5}]->(Movie27),
(Person12)-[:RATED {rating:9}]->(Movie28),
(Person12)-[:RATED {rating:7}]->(Movie29),
(Person12)-[:RATED {rating:3}]->(Movie30),
(Person13)-[:RATED {rating:7}]->(Movie1),
(Person13)-[:RATED {rating:10}]->(Movie2),
(Person13)-[:RATED {rating:7}]->(Movie3),
(Person13)-[:RATED {rating:8}]->(Movie4),
(Person13)-[:RATED {rating:9}]->(Movie5),
(Person13)-[:RATED {rating:4}]->(Movie6),
(Person13)-[:RATED {rating:6}]->(Movie7),
(Person13)-[:RATED {rating:3}]->(Movie8),
(Person13)-[:RATED {rating:7}]->(Movie9),
(Person13)-[:RATED {rating:9}]->(Movie10),
(Person13)-[:RATED {rating:4}]->(Movie11),
(Person13)-[:RATED {rating:7}]->(Movie12),
(Person13)-[:RATED {rating:6}]->(Movie13),
(Person13)-[:RATED {rating:6}]->(Movie14),
(Person13)-[:RATED {rating:9}]->(Movie15),
(Person13)-[:RATED {rating:9}]->(Movie16),
(Person13)-[:RATED {rating:8}]->(Movie17),
(Person13)-[:RATED {rating:7}]->(Movie18),
(Person13)-[:RATED {rating:8}]->(Movie19),
(Person13)-[:RATED {rating:5}]->(Movie20),
(Person13)-[:RATED {rating:4}]->(Movie21),
(Person13)-[:RATED {rating:4}]->(Movie22),
(Person13)-[:RATED {rating:10}]->(Movie23),
(Person13)-[:RATED {rating:7}]->(Movie24),
(Person13)-[:RATED {rating:10}]->(Movie25),
(Person13)-[:RATED {rating:8}]->(Movie26),
(Person13)-[:RATED {rating:8}]->(Movie27),
(Person13)-[:RATED {rating:10}]->(Movie28),
(Person13)-[:RATED {rating:10}]->(Movie29),
(Person13)-[:RATED {rating:9}]->(Movie30),
(Person14)-[:RATED {rating:5}]->(Movie1),
(Person14)-[:RATED {rating:8}]->(Movie2),
(Person14)-[:RATED {rating:8}]->(Movie4),
(Person14)-[:RATED {rating:2}]->(Movie5),
(Person14)-[:RATED {rating:10}]->(Movie7),
(Person14)-[:RATED {rating:9}]->(Movie9),
(Person14)-[:RATED {rating:9}]->(Movie10),
(Person14)-[:RATED {rating:8}]->(Movie13),
(Person14)-[:RATED {rating:7}]->(Movie14),
(Person14)-[:RATED {rating:9}]->(Movie15),
(Person14)-[:RATED {rating:8}]->(Movie16),
(Person14)-[:RATED {rating:9}]->(Movie19),
(Person14)-[:RATED {rating:6}]->(Movie20),
(Person14)-[:RATED {rating:7}]->(Movie22),
(Person14)-[:RATED {rating:9}]->(Movie24),
(Person14)-[:RATED {rating:7}]->(Movie25),
(Person14)-[:RATED {rating:5}]->(Movie27),
(Person14)-[:RATED {rating:6}]->(Movie28),
(Person14)-[:RATED {rating:7}]->(Movie29),
(Person15)-[:RATED {rating:8}]->(Movie1),
(Person15)-[:RATED {rating:10}]->(Movie2),
(Person15)-[:RATED {rating:4}]->(Movie3),
(Person15)-[:RATED {rating:5}]->(Movie4),
(Person15)-[:RATED {rating:6}]->(Movie5),
(Person15)-[:RATED {rating:7}]->(Movie6),
(Person15)-[:RATED {rating:8}]->(Movie7),
(Person15)-[:RATED {rating:8}]->(Movie8),
(Person15)-[:RATED {rating:6}]->(Movie9),
(Person15)-[:RATED {rating:10}]->(Movie10),
(Person15)-[:RATED {rating:10}]->(Movie11),
(Person15)-[:RATED {rating:9}]->(Movie12),
(Person15)-[:RATED {rating:4}]->(Movie13),
(Person15)-[:RATED {rating:10}]->(Movie14),
(Person15)-[:RATED {rating:5}]->(Movie15),
(Person15)-[:RATED {rating:9}]->(Movie16),
(Person15)-[:RATED {rating:5}]->(Movie17),
(Person15)-[:RATED {rating:10}]->(Movie18),
(Person15)-[:RATED {rating:8}]->(Movie19),
(Person15)-[:RATED {rating:8}]->(Movie20),
(Person15)-[:RATED {rating:5}]->(Movie21),
(Person15)-[:RATED {rating:6}]->(Movie22),
(Person15)-[:RATED {rating:5}]->(Movie23),
(Person15)-[:RATED {rating:8}]->(Movie24),
(Person15)-[:RATED {rating:7}]->(Movie25),
(Person15)-[:RATED {rating:9}]->(Movie26),
(Person15)-[:RATED {rating:10}]->(Movie27),
(Person15)-[:RATED {rating:10}]->(Movie28),
(Person15)-[:RATED {rating:9}]->(Movie29),
(Person15)-[:RATED {rating:8}]->(Movie30)
Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes). For two vectors A and B in an n-dimensional space:
\( \LARGE similarity(A, B) = \frac{A \cdot B}{\|A\| \times \|B\|} = \frac{\sum\limits_{i=1}^n A_{i} \times B_{i}}{\sqrt{\sum\limits_{i=1}^n A_{i}^2} \times \sqrt{\sum\limits_{i=1}^n B_{i}^2}} \)
Cosine similarity ranges between -1 and 1, where -1 is perfectly dissimilar and 1 is perfectly similar. ^[3]^
To be as clear as possible, I’ll pull two people from the data set and show how to manually calculate their cosine similarity.
Consider my UT Austin classmate Michael Sherman and Neo4j’s Michael Hunger. We are only interested in the movies that both of them rated, as cosine similarity is only calculated over non-NULL dimensions:
MATCH (p1:Person {name:'Michael Sherman'})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(p2:Person {name:'Michael Hunger'})
RETURN m.name AS Movie, r1.rating AS `M. Sherman's Rating`, r2.rating AS `M. Hunger's Rating`
Each person should be thought of as a vector where their coordinates are defined by their movie ratings. Thus:
\( \overrightarrow{M. Sherman} = \langle 3, 8, 7, 5, 2, 9 \rangle \)
\( \overrightarrow{M. Hunger} = \langle 10, 8, 6, 6, 4, 5 \rangle \)
\( \large similarity(M. Sherman, M. Hunger) = \frac{3 \cdot 10 + 8 \cdot 8 + 7 \cdot 6 + 5 \cdot 6 + 2 \cdot 4 + 9 \cdot 5}{\sqrt{3^2 + 8^2 + 7^2 + 5^2 + 2^2 + 9^2} \times \sqrt{10^2 + 8^2 + 6^2 + 6^2 + 4^2 + 5^2}} = \frac{219}{15.2315 \times 16.6433} = 0.8639 \)
I want to create a [:SIMILARITY] relationship between each person in the graph, where their cosine similarity is a property of the relationship. The query that accomplishes this is:
MATCH (p1:Person)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
CREATE UNIQUE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
However, this exceeds the maximum operations allowed in a Graph Gist, so Wes Freeman advised that I split it up into two steps. Be sure to execute (press the green arrows underneath) queries 3 & 4 to update the console with the similarity data.
MATCH (p1:Person)
WITH p1
LIMIT 10
MATCH (p1)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
CREATE UNIQUE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
MATCH (p1:Person)
WITH p1
SKIP 10
LIMIT 5
MATCH (p1)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
CREATE UNIQUE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
There is only one [:SIMILARITY] relationship between each person.
Let’s confirm the cosine similarities generated with Cypher are consistent with the cosine similarity calculated manually for M. Sherman and M. Hunger:
MATCH (p1:Person {name:'Michael Sherman'})-[s:SIMILARITY]-(p2:Person {name:'Michael Hunger'})
RETURN s.similarity AS `Cosine Similarity`
Looks good!
The updated graph model now looks like this:
See Alistair Jones's arrow tool.
MATCH (n:Movie {name:'Marley and Me'}) RETURN n
With the similarities added to the graph, it is easy to view your k-nearest neighbors. Let’s view Graph Alchemist Grace's 5-nearest neighbors:
MATCH (p1:Person {name:'Grace Andrews'})-[s:SIMILARITY]-(p2:Person)
WITH p2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 5
RETURN p2.name AS Neighbor, sim AS Similarity
These people, in descending order, rated movies most similarly to Grace.
Edit query 7 in the console if you filled out the form and want to see your nearest neighbors! ^[4]^ Your name in the graph is exactly how it was entered in the form.
Ultimately, I want to provide recommendations for movies that a person hasn’t rated (which I am naively assuming to mean that they haven’t seen the movie). As mentioned earlier, I decided to accomplish this by averaging the movie ratings from that person’s k-nearest neighbors (out of the neighbors who rated the relevant movie). ^[5]^ I decided to use k = 3 for the movie recommendations; these recommendations should be thought of as estimates of how much the person would like (or how the person would rate) the movies they haven’t seen.
Let’s get Zoltan's recommendations for the movies he hasn’t seen:
MATCH (b:Person)-[r:RATED]->(m:Movie), (b)-[s:SIMILARITY]-(a:Person {name:'Zoltan Varju'})
WHERE NOT((a)-[:RATED]->(m))
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS reco
ORDER BY reco DESC
RETURN movie AS Movie, reco AS Recommendation
It looks like Zoltan’s nearest neighbors most enjoyed WALL-E, Pan’s Labyrinth, The Bourne Trilogy, and Taken, and Zoltan should check out these movies!
Edit query 8 in the console to get your movie recommendations! ^[6]^ If you rated all movies, you don’t get any recommendations. Sorry!
Get all of the people who rated the movies that Zoltan didn’t rate, their ratings for those movies, and also get their similarities with Zoltan:
MATCH (b:Person)-[r:RATED]->(m:Movie), (b)-[s:SIMILARITY]-(a:Person {name:'Zoltan Varju'})
WHERE NOT((a)-[:RATED]->(m))
With the movies, similarities, and ratings, sort first by movie name and then by similarity descending ^[7]^ so that the ratings are in the correct order for collection in the next step:
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
Group by movie and grab the first three ratings into a collection called ratings:
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
Average the ratings in the ratings collection. Return the movie name and the average rating as the recommendation:
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS reco
ORDER BY reco DESC
RETURN movie AS Movie, reco AS Recommendation