Skip to content

Instantly share code, notes, and snippets.

@sscdotopen
Last active May 27, 2018 08:26
Show Gist options
  • Save sscdotopen/d97e6e981dbf7e4838f00aae917c1b6f to your computer and use it in GitHub Desktop.
Save sscdotopen/d97e6e981dbf7e4838f00aae917c1b6f to your computer and use it in GitHub Desktop.

Recommender Example

Input table

Table capturing ratings that users gave to items, with schema: user_id, item_id, rating, date

Preprocessing

  • group data by user_id
  • remove groups with less than 10 items
  • if a user has rated more than 500 items, just include the 500 latest items
  • create a binary interaction matrix from the data (users correspond to rows, items correspond to columns, cell (i,j) is 1 if user i interacted with item j, 0 otherwise

Algorithm

Item-Based Collaborative Filtering: compute item similarity matrix using Jaccard similarity.

Input: binary interaction matrix A Steps:

 // vector containing number of interactions per item
 val sums = colsums(A) 
 // cooccurrence matrix A'A between items
 val C = A.transposeTimes(A) 
 
 broadcast(sums) 
 // compute matrix S containing jaccard similarity between items
 val S = C.mapNonZeroEntries { case (itemA, itemB, numCooccurrences) =>
   numCooccurrences / (sums(itemA) + sums(itemB) - numCooccurrences) 
 }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment