Last active
January 9, 2019 20:11
-
-
Save xiaohk/e8fbdd059f16c5869ed814dcd1c79dbd to your computer and use it in GitHub Desktop.
Explore Surprise.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Surprise Exploration\n", | |
"\n", | |
"In this notebook, we can explore how a simple and general recommender system works using Surprise." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 1. Collaborative Filtering Algorithm\n", | |
"\n", | |
"There are two types of methods in Collaborative Filtering (CF): user-user and item-item. User-user refers to recommend users items based on other users having the same interests/tastes. Item-item means recommending items that are similar to what the user has liked/purchased." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from surprise import SVD, KNNBaseline, Dataset, NMF\n", | |
"import pandas as pd\n", | |
"import matplotlib.pyplot as plt\n", | |
"import numpy as np\n", | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Train a model to predict rating using user-item-rating tuple." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<surprise.prediction_algorithms.matrix_factorization.SVD at 0x118912828>" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data = Dataset.load_builtin('ml-100k')\n", | |
"trainset = data.build_full_trainset()\n", | |
"\n", | |
"model = SVD()\n", | |
"model.fit(trainset)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Recommend unwatched movies to a user using item-item CF." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Get the (user, unwatched movie, unknow rating) tuple\n", | |
"testset = trainset.build_anti_testset(fill=0)\n", | |
"predictions = model.test(testset)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('196', '302', 0.0),\n", | |
" ('196', '377', 0.0),\n", | |
" ('196', '51', 0.0),\n", | |
" ('196', '346', 0.0),\n", | |
" ('196', '474', 0.0),\n", | |
" ('196', '265', 0.0),\n", | |
" ('196', '465', 0.0),\n", | |
" ('196', '451', 0.0),\n", | |
" ('196', '86', 0.0),\n", | |
" ('196', '1014', 0.0)]" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"testset[:10]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[Prediction(uid='196', iid='302', r_ui=0.0, est=4.207584627777337, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='377', r_ui=0.0, est=2.638978793859669, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='51', r_ui=0.0, est=3.331218337187397, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='346', r_ui=0.0, est=3.8273963198197007, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='474', r_ui=0.0, est=4.161019254432978, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='265', r_ui=0.0, est=4.009726137469156, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='465', r_ui=0.0, est=3.6400440236616673, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='451', r_ui=0.0, est=3.3328102289555726, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='86', r_ui=0.0, est=3.8069009649047696, details={'was_impossible': False}),\n", | |
" Prediction(uid='196', iid='1014', r_ui=0.0, est=3.2353053587607623, details={'was_impossible': False})]" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"predictions[:10]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[('302', 4.207584627777337), ('377', 2.638978793859669), ('51', 3.331218337187397), ('346', 3.8273963198197007), ('474', 4.161019254432978), ('265', 4.009726137469156), ('465', 3.6400440236616673), ('451', 3.3328102289555726), ('86', 3.8069009649047696), ('1014', 3.2353053587607623)]\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"1643" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Estimate ratings for all unrated movies of user 196\n", | |
"prediction_u196 = list(map(lambda p: (p.iid, p.est),\n", | |
" filter(lambda p: True if p.uid == '196' else False,\n", | |
" predictions)))\n", | |
"print(prediction_u196[:10])\n", | |
"len(prediction_u196)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('483', 4.5772300515540145),\n", | |
" ('197', 4.565017976782498),\n", | |
" ('603', 4.556527569574576),\n", | |
" ('56', 4.528959173948327),\n", | |
" ('511', 4.504870863481298),\n", | |
" ('480', 4.4791866908164115),\n", | |
" ('408', 4.473526579614248),\n", | |
" ('272', 4.459467523992605),\n", | |
" ('178', 4.459254102576342),\n", | |
" ('223', 4.45621533788619)]" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Sort all those 1643 ratings and recommend the top 10 to him/her\n", | |
"prediction_top10_u196 = sorted(prediction_u196, key=lambda t: t[1], reverse=True)[:10]\n", | |
"prediction_top10_u196" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Get the names of these 10 movies\n", | |
"with open(\"/Users/JayWong/.surprise_data/ml-100k/ml-100k/u.item\", 'r', encoding='ISO-8859-1') as fp:\n", | |
" mapping_info = fp.readlines()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"```\n", | |
"u.item -- Information about the items (movies); this is a tab separated\n", | |
" list of\n", | |
" movie id | movie title | release date | video release date |\n", | |
" IMDb URL | unknown | Action | Adventure | Animation |\n", | |
" Children's | Comedy | Crime | Documentary | Drama | Fantasy |\n", | |
" Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |\n", | |
" Thriller | War | Western |\n", | |
" The last 19 fields are the genres, a 1 indicates the movie\n", | |
" is of that genre, a 0 indicates it is not; movies can be in\n", | |
" several genres at once.\n", | |
" The movie ids are the ones used in the u.data data set.\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>1</td>\n", | |
" <td>Toy Story (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Toy%20Story%2...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>2</td>\n", | |
" <td>GoldenEye (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?GoldenEye%20(...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>3</td>\n", | |
" <td>Four Rooms (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Four%20Rooms%...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>4</td>\n", | |
" <td>Get Shorty (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Get%20Shorty%...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>5</td>\n", | |
" <td>Copycat (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Copycat%20(1995)</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"0 1 Toy Story (1995) 01-Jan-1995 NaN \n", | |
"1 2 GoldenEye (1995) 01-Jan-1995 NaN \n", | |
"2 3 Four Rooms (1995) 01-Jan-1995 NaN \n", | |
"3 4 Get Shorty (1995) 01-Jan-1995 NaN \n", | |
"4 5 Copycat (1995) 01-Jan-1995 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 ... \\\n", | |
"0 http://us.imdb.com/M/title-exact?Toy%20Story%2... 0 0 0 1 1 ... \n", | |
"1 http://us.imdb.com/M/title-exact?GoldenEye%20(... 0 1 1 0 0 ... \n", | |
"2 http://us.imdb.com/M/title-exact?Four%20Rooms%... 0 0 0 0 0 ... \n", | |
"3 http://us.imdb.com/M/title-exact?Get%20Shorty%... 0 1 0 0 0 ... \n", | |
"4 http://us.imdb.com/M/title-exact?Copycat%20(1995) 0 0 0 0 0 ... \n", | |
"\n", | |
" 14 15 16 17 18 19 20 21 22 23 \n", | |
"0 0 0 0 0 0 0 0 0 0 0 \n", | |
"1 0 0 0 0 0 0 0 1 0 0 \n", | |
"2 0 0 0 0 0 0 0 1 0 0 \n", | |
"3 0 0 0 0 0 0 0 0 0 0 \n", | |
"4 0 0 0 0 0 0 0 1 0 0 \n", | |
"\n", | |
"[5 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df = pd.read_csv(\"/Users/JayWong/.surprise_data/ml-100k/ml-100k/u.item\", sep='|',\n", | |
" encoding='ISO-8859-1', header=None)\n", | |
"df.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, we get the top 10 movies that we can recommend to user 196." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>482</th>\n", | |
" <td>483</td>\n", | |
" <td>Casablanca (1942)</td>\n", | |
" <td>01-Jan-1942</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Casablanca%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>196</th>\n", | |
" <td>197</td>\n", | |
" <td>Graduate, The (1967)</td>\n", | |
" <td>01-Jan-1967</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Graduate,%20T...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>602</th>\n", | |
" <td>603</td>\n", | |
" <td>Rear Window (1954)</td>\n", | |
" <td>01-Jan-1954</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Rear%20Window...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>55</th>\n", | |
" <td>56</td>\n", | |
" <td>Pulp Fiction (1994)</td>\n", | |
" <td>01-Jan-1994</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Pulp%20Fictio...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>510</th>\n", | |
" <td>511</td>\n", | |
" <td>Lawrence of Arabia (1962)</td>\n", | |
" <td>01-Jan-1962</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Lawrence%20of...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>479</th>\n", | |
" <td>480</td>\n", | |
" <td>North by Northwest (1959)</td>\n", | |
" <td>01-Jan-1959</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?North%20by%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>407</th>\n", | |
" <td>408</td>\n", | |
" <td>Close Shave, A (1995)</td>\n", | |
" <td>28-Apr-1996</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Close%20Shave...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>271</th>\n", | |
" <td>272</td>\n", | |
" <td>Good Will Hunting (1997)</td>\n", | |
" <td>01-Jan-1997</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?imdb-title-11...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>177</th>\n", | |
" <td>178</td>\n", | |
" <td>12 Angry Men (1957)</td>\n", | |
" <td>01-Jan-1957</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?12%20Angry%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>222</th>\n", | |
" <td>223</td>\n", | |
" <td>Sling Blade (1996)</td>\n", | |
" <td>22-Nov-1996</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Sling%20Blade...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>10 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"482 483 Casablanca (1942) 01-Jan-1942 NaN \n", | |
"196 197 Graduate, The (1967) 01-Jan-1967 NaN \n", | |
"602 603 Rear Window (1954) 01-Jan-1954 NaN \n", | |
"55 56 Pulp Fiction (1994) 01-Jan-1994 NaN \n", | |
"510 511 Lawrence of Arabia (1962) 01-Jan-1962 NaN \n", | |
"479 480 North by Northwest (1959) 01-Jan-1959 NaN \n", | |
"407 408 Close Shave, A (1995) 28-Apr-1996 NaN \n", | |
"271 272 Good Will Hunting (1997) 01-Jan-1997 NaN \n", | |
"177 178 12 Angry Men (1957) 01-Jan-1957 NaN \n", | |
"222 223 Sling Blade (1996) 22-Nov-1996 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 \\\n", | |
"482 http://us.imdb.com/M/title-exact?Casablanca%20... 0 0 0 0 0 \n", | |
"196 http://us.imdb.com/M/title-exact?Graduate,%20T... 0 0 0 0 0 \n", | |
"602 http://us.imdb.com/M/title-exact?Rear%20Window... 0 0 0 0 0 \n", | |
"55 http://us.imdb.com/M/title-exact?Pulp%20Fictio... 0 0 0 0 0 \n", | |
"510 http://us.imdb.com/M/title-exact?Lawrence%20of... 0 0 1 0 0 \n", | |
"479 http://us.imdb.com/M/title-exact?North%20by%20... 0 0 0 0 0 \n", | |
"407 http://us.imdb.com/M/title-exact?Close%20Shave... 0 0 0 1 0 \n", | |
"271 http://us.imdb.com/M/title-exact?imdb-title-11... 0 0 0 0 0 \n", | |
"177 http://us.imdb.com/M/title-exact?12%20Angry%20... 0 0 0 0 0 \n", | |
"222 http://us.imdb.com/M/title-exact?Sling%20Blade... 0 0 0 0 0 \n", | |
"\n", | |
" ... 14 15 16 17 18 19 20 21 22 23 \n", | |
"482 ... 0 0 0 0 0 1 0 0 1 0 \n", | |
"196 ... 0 0 0 0 0 1 0 0 0 0 \n", | |
"602 ... 0 0 0 0 1 0 0 1 0 0 \n", | |
"55 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"510 ... 0 0 0 0 0 0 0 0 1 0 \n", | |
"479 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"407 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"271 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"177 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"222 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"\n", | |
"[10 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.iloc[list(map(lambda x: int(x[0]) - 1, prediction_top10_u196))]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 2. Finding Similar Movies/Users\n", | |
"\n", | |
"Some algorithms in Surprise uses similarity measure, so we can find KNN item/user after training such algorithms." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Estimating biases using als...\n", | |
"Computing the pearson_baseline similarity matrix...\n", | |
"Done computing similarity matrix.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<surprise.prediction_algorithms.knns.KNNBaseline at 0x1469ff358>" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sim_options = {'name': 'pearson_baseline', 'user_based': False}\n", | |
"model = KNNBaseline(sim_options=sim_options)\n", | |
"model.fit(trainset)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Lets try to find the 10 nearest neighbor of the film \"Titanic\"." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_n_neighbor_index(base_index, n=10):\n", | |
" base_inner_id = model.trainset.to_inner_iid(str(base_index + 1))\n", | |
" base_neighbors_inner_id = model.get_neighbors(base_inner_id, k=10)\n", | |
" base_neighbors_index = [int(model.trainset.to_raw_iid(i)) - 1 for i in base_neighbors_inner_id]\n", | |
" return base_neighbors_index" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[312]\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>312</th>\n", | |
" <td>313</td>\n", | |
" <td>Titanic (1997)</td>\n", | |
" <td>01-Jan-1997</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?imdb-title-12...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>1 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"312 313 Titanic (1997) 01-Jan-1997 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 \\\n", | |
"312 http://us.imdb.com/M/title-exact?imdb-title-12... 0 1 0 0 0 \n", | |
"\n", | |
" ... 14 15 16 17 18 19 20 21 22 23 \n", | |
"312 ... 0 0 0 0 0 1 0 0 0 0 \n", | |
"\n", | |
"[1 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# To get item neighbors, we need to transfer movie name -> raw ID -> inner ID.\n", | |
"possible_index = [i for i in range(len(df[1])) if \"titanic\" in df[1][i].lower()]\n", | |
"print(possible_index)\n", | |
"df.iloc[possible_index]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>299</th>\n", | |
" <td>300</td>\n", | |
" <td>Air Force One (1997)</td>\n", | |
" <td>01-Jan-1997</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Air+Force+One...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>22</td>\n", | |
" <td>Braveheart (1995)</td>\n", | |
" <td>16-Feb-1996</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Braveheart%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>422</th>\n", | |
" <td>423</td>\n", | |
" <td>E.T. the Extra-Terrestrial (1982)</td>\n", | |
" <td>01-Jan-1982</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?E%2ET%2E%20th...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>...</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>209</th>\n", | |
" <td>210</td>\n", | |
" <td>Indiana Jones and the Last Crusade (1989)</td>\n", | |
" <td>01-Jan-1989</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Indiana%20Jon...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>271</th>\n", | |
" <td>272</td>\n", | |
" <td>Good Will Hunting (1997)</td>\n", | |
" <td>01-Jan-1997</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?imdb-title-11...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>173</th>\n", | |
" <td>174</td>\n", | |
" <td>Raiders of the Lost Ark (1981)</td>\n", | |
" <td>01-Jan-1981</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Raiders%20of%...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>309</th>\n", | |
" <td>310</td>\n", | |
" <td>Rainmaker, The (1997)</td>\n", | |
" <td>01-Jan-1997</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Rainmaker,+Th...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>81</th>\n", | |
" <td>82</td>\n", | |
" <td>Jurassic Park (1993)</td>\n", | |
" <td>01-Jan-1993</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Jurassic%20Pa...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>143</th>\n", | |
" <td>144</td>\n", | |
" <td>Die Hard (1988)</td>\n", | |
" <td>01-Jan-1988</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Die%20Hard%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>214</th>\n", | |
" <td>215</td>\n", | |
" <td>Field of Dreams (1989)</td>\n", | |
" <td>01-Jan-1989</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Field%20of%20...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>10 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"299 300 Air Force One (1997) 01-Jan-1997 NaN \n", | |
"21 22 Braveheart (1995) 16-Feb-1996 NaN \n", | |
"422 423 E.T. the Extra-Terrestrial (1982) 01-Jan-1982 NaN \n", | |
"209 210 Indiana Jones and the Last Crusade (1989) 01-Jan-1989 NaN \n", | |
"271 272 Good Will Hunting (1997) 01-Jan-1997 NaN \n", | |
"173 174 Raiders of the Lost Ark (1981) 01-Jan-1981 NaN \n", | |
"309 310 Rainmaker, The (1997) 01-Jan-1997 NaN \n", | |
"81 82 Jurassic Park (1993) 01-Jan-1993 NaN \n", | |
"143 144 Die Hard (1988) 01-Jan-1988 NaN \n", | |
"214 215 Field of Dreams (1989) 01-Jan-1989 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 \\\n", | |
"299 http://us.imdb.com/M/title-exact?Air+Force+One... 0 1 0 0 0 \n", | |
"21 http://us.imdb.com/M/title-exact?Braveheart%20... 0 1 0 0 0 \n", | |
"422 http://us.imdb.com/M/title-exact?E%2ET%2E%20th... 0 0 0 0 1 \n", | |
"209 http://us.imdb.com/M/title-exact?Indiana%20Jon... 0 1 1 0 0 \n", | |
"271 http://us.imdb.com/M/title-exact?imdb-title-11... 0 0 0 0 0 \n", | |
"173 http://us.imdb.com/M/title-exact?Raiders%20of%... 0 1 1 0 0 \n", | |
"309 http://us.imdb.com/M/title-exact?Rainmaker,+Th... 0 0 0 0 0 \n", | |
"81 http://us.imdb.com/M/title-exact?Jurassic%20Pa... 0 1 1 0 0 \n", | |
"143 http://us.imdb.com/M/title-exact?Die%20Hard%20... 0 1 0 0 0 \n", | |
"214 http://us.imdb.com/M/title-exact?Field%20of%20... 0 0 0 0 0 \n", | |
"\n", | |
" ... 14 15 16 17 18 19 20 21 22 23 \n", | |
"299 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"21 ... 0 0 0 0 0 0 0 0 1 0 \n", | |
"422 ... 1 0 0 0 0 0 1 0 0 0 \n", | |
"209 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"271 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"173 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"309 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"81 ... 0 0 0 0 0 0 1 0 0 0 \n", | |
"143 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"214 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"\n", | |
"[10 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.iloc[get_n_neighbor_index(possible_index[0])]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It finds many big-name and classic films which are \"similar\" to \"Titanic\".\n", | |
"\n", | |
"We also can find 10 neighbors of \"Akira\", the Japanese animation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[205]\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>205</th>\n", | |
" <td>206</td>\n", | |
" <td>Akira (1988)</td>\n", | |
" <td>01-Jan-1988</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Akira%20(1988)</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>1 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"205 206 Akira (1988) 01-Jan-1988 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 ... \\\n", | |
"205 http://us.imdb.com/M/title-exact?Akira%20(1988) 0 0 1 1 0 ... \n", | |
"\n", | |
" 14 15 16 17 18 19 20 21 22 23 \n", | |
"205 0 0 0 0 0 0 1 1 0 0 \n", | |
"\n", | |
"[1 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"possible_index = [i for i in range(len(df[1])) if \"akira\" in df[1][i].lower()]\n", | |
"print(possible_index)\n", | |
"df.iloc[possible_index]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" <th>1</th>\n", | |
" <th>2</th>\n", | |
" <th>3</th>\n", | |
" <th>4</th>\n", | |
" <th>5</th>\n", | |
" <th>6</th>\n", | |
" <th>7</th>\n", | |
" <th>8</th>\n", | |
" <th>9</th>\n", | |
" <th>...</th>\n", | |
" <th>14</th>\n", | |
" <th>15</th>\n", | |
" <th>16</th>\n", | |
" <th>17</th>\n", | |
" <th>18</th>\n", | |
" <th>19</th>\n", | |
" <th>20</th>\n", | |
" <th>21</th>\n", | |
" <th>22</th>\n", | |
" <th>23</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>197</th>\n", | |
" <td>198</td>\n", | |
" <td>Nikita (La Femme Nikita) (1990)</td>\n", | |
" <td>01-Jan-1990</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Nikita%20(1990)</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>67</th>\n", | |
" <td>68</td>\n", | |
" <td>Crow, The (1994)</td>\n", | |
" <td>01-Jan-1994</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Crow,%20The%2...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>167</th>\n", | |
" <td>168</td>\n", | |
" <td>Monty Python and the Holy Grail (1974)</td>\n", | |
" <td>01-Jan-1974</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Monty%20Pytho...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>86</th>\n", | |
" <td>87</td>\n", | |
" <td>Searching for Bobby Fischer (1993)</td>\n", | |
" <td>01-Jan-1993</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Searching%20f...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>44</th>\n", | |
" <td>45</td>\n", | |
" <td>Eat Drink Man Woman (1994)</td>\n", | |
" <td>01-Jan-1994</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Yinshi%20Nan%...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>483</th>\n", | |
" <td>484</td>\n", | |
" <td>Maltese Falcon, The (1941)</td>\n", | |
" <td>01-Jan-1941</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Maltese%20Fal...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>635</th>\n", | |
" <td>636</td>\n", | |
" <td>Escape from New York (1981)</td>\n", | |
" <td>01-Jan-1981</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Escape%20from...</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>31</td>\n", | |
" <td>Crimson Tide (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Crimson%20Tid...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>272</th>\n", | |
" <td>273</td>\n", | |
" <td>Heat (1995)</td>\n", | |
" <td>01-Jan-1995</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Heat%20(1995)</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>1</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>922</th>\n", | |
" <td>923</td>\n", | |
" <td>Raise the Red Lantern (1991)</td>\n", | |
" <td>01-Jan-1991</td>\n", | |
" <td>NaN</td>\n", | |
" <td>http://us.imdb.com/M/title-exact?Da%20Hong%20D...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>...</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" <td>0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>10 rows × 24 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0 1 2 3 \\\n", | |
"197 198 Nikita (La Femme Nikita) (1990) 01-Jan-1990 NaN \n", | |
"67 68 Crow, The (1994) 01-Jan-1994 NaN \n", | |
"167 168 Monty Python and the Holy Grail (1974) 01-Jan-1974 NaN \n", | |
"86 87 Searching for Bobby Fischer (1993) 01-Jan-1993 NaN \n", | |
"44 45 Eat Drink Man Woman (1994) 01-Jan-1994 NaN \n", | |
"483 484 Maltese Falcon, The (1941) 01-Jan-1941 NaN \n", | |
"635 636 Escape from New York (1981) 01-Jan-1981 NaN \n", | |
"30 31 Crimson Tide (1995) 01-Jan-1995 NaN \n", | |
"272 273 Heat (1995) 01-Jan-1995 NaN \n", | |
"922 923 Raise the Red Lantern (1991) 01-Jan-1991 NaN \n", | |
"\n", | |
" 4 5 6 7 8 9 \\\n", | |
"197 http://us.imdb.com/M/title-exact?Nikita%20(1990) 0 0 0 0 0 \n", | |
"67 http://us.imdb.com/M/title-exact?Crow,%20The%2... 0 1 0 0 0 \n", | |
"167 http://us.imdb.com/M/title-exact?Monty%20Pytho... 0 0 0 0 0 \n", | |
"86 http://us.imdb.com/M/title-exact?Searching%20f... 0 0 0 0 0 \n", | |
"44 http://us.imdb.com/M/title-exact?Yinshi%20Nan%... 0 0 0 0 0 \n", | |
"483 http://us.imdb.com/M/title-exact?Maltese%20Fal... 0 0 0 0 0 \n", | |
"635 http://us.imdb.com/M/title-exact?Escape%20from... 0 1 1 0 0 \n", | |
"30 http://us.imdb.com/M/title-exact?Crimson%20Tid... 0 0 0 0 0 \n", | |
"272 http://us.imdb.com/M/title-exact?Heat%20(1995) 0 1 0 0 0 \n", | |
"922 http://us.imdb.com/M/title-exact?Da%20Hong%20D... 0 0 0 0 0 \n", | |
"\n", | |
" ... 14 15 16 17 18 19 20 21 22 23 \n", | |
"197 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"67 ... 0 0 0 0 0 1 0 1 0 0 \n", | |
"167 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"86 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"44 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"483 ... 0 1 0 0 1 0 0 0 0 0 \n", | |
"635 ... 0 0 0 0 0 0 1 1 0 0 \n", | |
"30 ... 0 0 0 0 0 0 0 1 1 0 \n", | |
"272 ... 0 0 0 0 0 0 0 1 0 0 \n", | |
"922 ... 0 0 0 0 0 0 0 0 0 0 \n", | |
"\n", | |
"[10 rows x 24 columns]" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.iloc[get_n_neighbor_index(possible_index[0])]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It finds many Asian and foreign films based on \"Akira\"." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 3. CF Factorization\n", | |
"\n", | |
"<a href=\"https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_systems)\">Matrix factorization</a> is a class of Collaborative Filtering algorithms. It decompose the rating matrix $\\mathbb{R}^{u\\times i}$ into two latent space matrices $\\mathbb{R}^{u\\times f}$ and $\\mathbb{R}^{f\\times i}$. I think we might be interested in visualizing the latent space factor vectors. Different matrix factorization methods should yield different decompositions." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 3.1. SVD Decomposition" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<surprise.prediction_algorithms.matrix_factorization.SVD at 0x10caa1828>" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data = Dataset.load_builtin('ml-100k')\n", | |
"trainset = data.build_full_trainset()\n", | |
"\n", | |
"svd_model = SVD()\n", | |
"svd_model.fit(trainset)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$\\mathbb{R}^{u\\times f}$" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(943, 100)\n", | |
"[[ 0.09827454 -0.03976238 0.01818848 ... 0.06603582 -0.13693396\n", | |
" 0.09220304]\n", | |
" [ 0.2430969 -0.21293087 -0.05987946 ... -0.04940762 0.12689275\n", | |
" -0.09852063]\n", | |
" [ 0.00588489 0.2621444 0.00371794 ... 0.06920038 0.2109561\n", | |
" 0.01523976]\n", | |
" ...\n", | |
" [-0.27229863 -0.14024293 -0.01151298 ... -0.27645588 0.22114294\n", | |
" 0.19151027]\n", | |
" [-0.04344653 0.13792765 0.07480039 ... 0.22425652 0.05482876\n", | |
" -0.13347536]\n", | |
" [ 0.20231929 0.06558981 0.03920855 ... 0.12901358 -0.05891041\n", | |
" 0.04709826]]\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"print(svd_model.pu.shape)\n", | |
"print(svd_model.pu)\n", | |
"\n", | |
"plt.imshow(svd_model.pu, cmap='hot', interpolation='nearest')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$\\mathbb{R}^{i\\times f}$" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(1682, 100)\n", | |
"[[-0.17629553 0.11007775 0.0442902 ... -0.13306982 0.05124126\n", | |
" -0.0556131 ]\n", | |
" [ 0.10553786 0.39702341 -0.23784865 ... -0.0640464 -0.05775928\n", | |
" -0.05337625]\n", | |
" [ 0.01858199 -0.09989119 0.02385218 ... 0.00385601 0.02945917\n", | |
" 0.02013456]\n", | |
" ...\n", | |
" [ 0.02073325 -0.04456793 0.10833881 ... -0.12161109 -0.04055527\n", | |
" -0.12866432]\n", | |
" [-0.05337481 0.11682032 0.00655933 ... -0.04635817 -0.0631682\n", | |
" 0.07114037]\n", | |
" [-0.05437292 -0.04211752 -0.02952294 ... -0.04904702 -0.00770515\n", | |
" 0.13037699]]\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"print(svd_model.qi.shape)\n", | |
"print(svd_model.qi)\n", | |
"\n", | |
"plt.imshow(svd_model.qi, cmap='hot', interpolation='nearest')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 3.2. NMF Decomposition" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<surprise.prediction_algorithms.matrix_factorization.NMF at 0x11e4e6668>" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data = Dataset.load_builtin('ml-100k')\n", | |
"trainset = data.build_full_trainset()\n", | |
"\n", | |
"nmf_model = NMF()\n", | |
"nmf_model.fit(trainset)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(943, 15)\n", | |
"[[0.74925821 0.33733162 0.25183698 ... 0.76754821 0.46347268 0.19515171]\n", | |
" [0.49229828 1.11158307 1.10293635 ... 0.2992807 0.23762653 0.51594267]\n", | |
" [0.15197183 0.96215718 0.21495458 ... 1.02602312 0.56869154 0.62312887]\n", | |
" ...\n", | |
" [0.28880653 0.18415247 0.12133848 ... 0.73764731 0.33596941 0.31132305]\n", | |
" [0.46113588 0.97649045 0.28678778 ... 0.59274594 0.62109594 0.84487729]\n", | |
" [0.61008222 0.13789916 0.84593444 ... 0.80971322 0.04439998 0.24758137]]\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAADEAAAD8CAYAAADe6kx2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAADuhJREFUeJztnXmUl2UVxz8XZpgRGBgGxpEtmFFCTY6pU0JSmbses7Tc0lzS7Bhl2UqdTmW5RIcs991c0gqXisNxX0g9EomWhiw5gAvYsM8wQAwDc/vjPkM/cJjf73mYX3PfDt9z3vN7t/u+7/f37Pfe5z6iqmQdvXr6A7oDu0l4wW4SXrCbxM4gIseJyEIRaRCRycV4x3ZQ1W7dgN7AIqAO6AO8Cuzf3e/J3YqREh8GGlR1sapuBn4HfKoI79mGYpAYDryTc7w0nNsOInKRiMwRkTn9e4nWi2h9b9ExIioiK2Ne2GMFW1VvVdV6Va0vU3gSOK8dLrTLb8U8q6QI37cMGJlzPCKc2yl6A1cAvwaeSHhhMVLiJWCMiNSKSB/gDGB6VwItQAXQAMxLeGG3p4SqbhGRrwCPY3/ynar6elcyA4BXgLlAe8I7xUNXfKyIHgKMAr4HDISXVbW+UHkXLXYTMAa4qgqiqqWAYhTsaJQD5wBXrrHWMRYuUqI/cBbw/eNhbIK8CxIrsI9f9Cg8miDvgkQ7sBhoBK5LkHdRJoZhZeIkrHzEwkVKlAD7YkSWDUyT73G8BXx0H6hvgMea4+VdkHhf+D2P7bu/hcJFdtoMUA2nAx9KkHdBohE4ZRa8AVyT8EUuslNdNTy8GQY3w4cSeoAuUqJlJSxqhu8AR4yIl3dBoqIC9j7LGrw9lyY8oJhaiEK3WtCNoLoPeh8oMCdG3kWZWAlMA5Y12OguFi6y00jg3DK4B2u1Y+GCxL+AFa2woJdp2mLhgsRgYDLAKHghQd4FCQEGApcugQMT5F2Q2Ih1OfbCVDexcEGiFFPZNAEFqzhy4IKEYB2/75Cm7XBBAqwTeBFp2g4Xjd0moAq4BMtasXCREnsBrUA18OcEeRck3gBuB24CvlsbL+8iO/XDVOkPAy8tiZd3kRLlmD3sEUx9EwsXJPphncAmYEOCvAsSq7HaCeDnCfIuSAwAZgO/Ac5OkHdBYmPYXiJN7+SidhqEFe4fYOUjFi5I9Abux0ystyTIu8hO7wJXY9mpOkHeBYlyzPT7baxsxCIvCREZKSLPisg8EXldRL4WzleJyJMi8kb4HRTOi4hcGzxsXhORg/O9oxXrelyMdT+iUYDXzFDg4LBfAfwT2B+r0ieH85OBKWH/BMxqJcB4YHa+d4wE/RXodNCnE/ROKa5AfwKOBhYCQ3OILgz7twBn5ty/7b6dbWNA9Tz0HNB7EkhElQkRGQ0chLVNNar6r3CpEagJ+wV52bDDDTfeZWrMWTEfFFAwCRHpDzwEfF1V1+VeU/vLo1wTcl2FegHHAkcBNyYolAvNQqWYr8Y3Ossm7GJ2KgGdBdoOWleM7CQiAtwBzFfVq3MuTQfODfvnYmWl4/w5oZYaDzTnZLtOcWB/GA1IKZyc74M6QwGpMBH7d14D/h62EzDF3dNY7fgUUBXuF+AGzA/wH0B9vndUgV4EqmXoyQkp4cLLpkpEhwKTwvGkSC8bF32nauA+TOdUlefezuCCRCvWctZh+TIWLvpOihWeb4YtFi5ItAMHA4N6wYsJ8i5IDMRS4ux262zFwkWZaAG+BlTy38YmBi5SogXzxNyKOTLGwgWJUZh/Rx/SxhMuSLQC44AfYZ3AWLggsRnr25+I9elj4YJEM9ZBu74GvphgoHBBohz4xknwpeVwdFu8vAsSm4DbppvR8cn94uVdkKjAph38GOBL8fIuSDRi9onpwIVfj5d3QaIS+CxwLXBZgrwLEn2xRm408IUEeRd9p16Yc++5tbBsic3fiZXvceyBqVP+uAQSalgfJEow9X4zpoWIhQsSHf4c84DlCfIuSPTG7HV1mBEyFi5IvI212tcC309ws3FB4iBMQTACGJlgKnJBYgmWCg3YbNtYuCDRglWxi0bBswnyLkiUEGqoRvjBkHh5FySGA48Bn2vF+h6RcEFCgMMxU9Nlc+LlXZBoA/bGbNlNCfIuOoCD+8MB6+G0RHkXKbFlPRwB3EoaERck1mBeZycDp/aNl3dBohnz7/ghcH2CX4SLMrEXVrgPIsPuc6swM9cVWE82Fi5IVGKhLt4CHkyQj/Eo6C0ifxORGeG4VkRmB2+a34fIEIhIWThuCNdH53t2GzYgGsb2sTK6nQRmB5mfczwF+KWq7gOsBS4I5y8A1obzvwz3dYkNWIyCOuCqiA/ahgLdIkZghvcjgBlYT2EVUBKuTwAeD/uPAxPCfkm4T7p6fimoTkaPAdXS4nnZ/Aqb3tAxqXIw0KSqW8JxrifNNi+bcL053L9TDAAe/pkplvcthkJZRE4EVqjqy/GP7/K527xs1mP+fxdjbUUsCmknDgNOEpETsD9rAHANUCkiJeHfzo1X0xHLZqmIlGDG0feM/1X1VqynQY2I9sOmp6VEi4j1OjscmBH2HwDOCPs3A18O+5OAm8P+GcC0fM/dD3RNcFJpKLb73A4k6oC/YkPjB4CycL48HDeE63X5nlsDegnoXNBpxZ57qqozgZlhfzEWVWvHezYBp8Y8txqzUZSTpwbYCVy02IuAy8eaoSWzHsqVwNSFtr9/gryLXuxW4Ftl8GYrfDlB3gWJFmBMq6VISt/JBYn9sIbkIaxL8IdIeRckFmIju5+Q5vDugkQpVsVOHQGDEwItuKid6gA5AL6wFFYn9DtckHgHmDkXfg/wZry8CxJbsLkL3wMaW+PlXZSJEcCVmC42Qe3kg0QzsA6znEYFxQxwkZ2GAJ8us/2UbocLEu8CR7daLZUSfMRNdqrCYnYsTHlAzOCjWFtdGNENCxNBKOacomJh0F6mwjyTDAczXN4Iv8A8C55PkHdBoqYXDDoLjidtKnOPlwdVm44zC3QEqI7IaJnoB4w/xiYqPZnVXqwClz9hit7MjrHXYlqONcCGWszZIwIuUqIKeGcl/AUYntW4HXtWmovQG8BPE+RdkFjeZPaAsaR5FLgg0Yj5iX+gNMk/xUfBHoipaUa3pQ2KXKRENWZgqSbJ391HSqwGDgWmYhPLY+GCRH+sUCf1m3CSndqxuc/9gM8nTMdxkxKTMBLjsjodpxfmpHIk21YCiZbvcczD4p2dE35j4YLEnpiS4Agy7NzbjnmdtQEfvCLhAT09qlNVKgh+HcPQO4ql7RCRShF5UEQWiMh8EZnQnQF5+mPeBG++W9wycQ3wmKrui4V0nY8psp9W1TGYB07HGl3HY54/Y7Bwl3m/axgWavUwbCJtNPIlFdY/W8IO7j50YwSVsaCTQpa6rkjZqRbzR/918Dy7XUT6sYsBeXK9bDZgSrPZpNU0hciUYHEQblLVDofJ7ZZ3U/vLo6KYaM6yV1swZfLpmFdBLAohsRRYqqqzw/GDGKnlIjIUIPyuCNejl71ag81kOZMixQFU1UbgHRHpMAkeiTWy3RaQZwg2oqsAPh7LgMI7gF8F7gsel4uB87E/YJqIXIAZeDrcvB/B9GANmFn6/EI+oi3cnKKL7fGGTtVint0FOjH4PJFFNaZghaYNWyEnFi5IDMIK1QYgYSKLDxJbgeEDzcvmgnw3dwIXJNoAHrLfygR5F8PT3sClR8EnSCPhIiX2GGZOjHNJs566ILHpXfgW1lpnNoJKA9bLbCLDutgKLEjVQKw7EDswcpESwzBV5j5keNWDDViM/WOBwz8WL+8iO7Vh6pp2YMFz8fIuUmIT8BksK6VMx3FBog9wF6bez6sa6QQuSHS4Cn2cNPuECxKtWLXaRIZXjBpXY6GRvkiG50+wysa/F2JBSGLhgsT6reYH+CBpkXtdkGjG4hu3kGHPMwU+OdH6UF0qqHYCFyRKgX1fsGo2s869TcCCUmsjHk74IhckBgN3t1mL/XxWlwxtxuJjvkWGYyhvxIzx+2Pqm1i4IDFusGWlRmyEFwsXJN5ebSSmkOHoc4Mx17k7Ma1HLFyM7NZhkbZKgdsS5F2QaMc8Me8mzXrqIjsNH2AFuokMK89WrbNx9olkOJjhOszgl6I4AyckajAjy2wyvO6pAqccYJrAiQnyLkgIwDizZ89IeUAh1kngUuB1zITwW8ycUIvlgAZs2mifcG9ZOG4I10fne/4hfW2F8lWgc4q0Os5wbF3VelU9ADPsnEE3BuRhMzyDBcG5Ie/NnaCAVOhwOKnCGscZmO632wLyjAStAl0SApDQ3Smhqssw5+G3sUXFm4GX2cWAPDt62RwKnBJ+Y1FIdhqE6bZqsQqkH3Bcwru2Q66XTQ2WX1852bJVLAqpnY4ClqjqSlVtwxaKPYwQkCfc01lAHroKyLMjngJ43gpcLAoh8TYwXkT6hiWwOrxsnsVimsN7vWw6vG8+CzwT/KF2in8DU4fA+FVpzr2FVrGXAQuwKvZerBrttoA8o0EPCmvZ3fe/WJSvGNtQ0CeCl82ErHrZdEQUuhcze8XCxaBoa9i+TYan46zAUuOB/TI8xn5/OdywCVbOt65ALFyQYJOlxKGY+3MsXJCYh9ntXiHDRpZKrEGZSIaHp3tiQ9QGMlwmFJs4ezwZXpSvGbOalpNWsF2Q2IotVHkkpsqMhQsSFdjAfSZmQY2FCxJ9xT5+KjbqioULEqvVph3MxtqKWLgg0Q5MA16cADcmyLuoYldgI6z7Z1n5iIWLlOiP+QGmLI0ITkiUYgP2F4BPZnU9uwGYimQD8PL8PDd3AhckVmJGx7mYy1AsXBTs9ZiZa8qB8MFX4+VdpEQ1cFwZ3P+qzfWMhQsSQ4ArW21a2CUJ8i5ItGDjiGoy7qAyDTgai/QeCxckFOs3zSAtwrsLEhuBmXdYdkrw2TILTk9DRFqw6URDMMvSKFUteFE4FymBTUivB1YFw0vUqnZeSOwSdpPoRty6w28UXBTsXYWXlNgl/F+Q6LGuuIjciflpbcSU4r0xHdpHsImPbwKnqeravA/rKWMj8DHMvakV0xP0wXQGV4frk4Eprg2PqvocZrTfrKqLVXUz5jW0KdxyN/DpQp7V02WiYxW4DvTFhtywfVSWLtHTJHaK4IVQUP3f0yQa2V4RvhHza9wxKkuX6GkSrwFlYVm5Ptg/Xx6u5fqLdIkea7FF5LfY4k3V2MevxbodE4D3EaKyqOqavM/a3e1wgt0kvGA3CS/YTcIL/gPlVDJrTvbjkwAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"print(nmf_model.pu.shape)\n", | |
"print(nmf_model.pu)\n", | |
"\n", | |
"plt.imshow(nmf_model.pu, cmap='hot', interpolation='nearest')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(943, 15)\n", | |
"[[0.32517699 0.51912292 0.68280434 ... 0.10826895 0.66647537 0.59907808]\n", | |
" [0.23358898 0.72173147 0.3044164 ... 0.60960966 0.66931133 0.89965064]\n", | |
" [0.42947581 0.29734101 0.1327856 ... 0.00710044 0.10511228 0.49251732]\n", | |
" ...\n", | |
" [0.52350583 0.52568898 0.24850974 ... 0.62218505 0.16235292 0.33266798]\n", | |
" [0.51743654 0.46797779 0.70373928 ... 0.46614849 0.14673009 0.77151883]\n", | |
" [0.26756476 0.47281063 0.7335038 ... 0.71571561 0.15375436 0.76875341]]\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAADEAAAD8CAYAAADe6kx2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAADuhJREFUeJztnXmUl2UVxz8XZpgRGBgGxpEtmFFCTY6pU0JSmbses7Tc0lzS7Bhl2UqdTmW5RIcs991c0gqXisNxX0g9EomWhiw5gAvYsM8wQAwDc/vjPkM/cJjf73mYX3PfDt9z3vN7t/u+7/f37Pfe5z6iqmQdvXr6A7oDu0l4wW4SXrCbxM4gIseJyEIRaRCRycV4x3ZQ1W7dgN7AIqAO6AO8Cuzf3e/J3YqREh8GGlR1sapuBn4HfKoI79mGYpAYDryTc7w0nNsOInKRiMwRkTn9e4nWi2h9b9ExIioiK2Ne2GMFW1VvVdV6Va0vU3gSOK8dLrTLb8U8q6QI37cMGJlzPCKc2yl6A1cAvwaeSHhhMVLiJWCMiNSKSB/gDGB6VwItQAXQAMxLeGG3p4SqbhGRrwCPY3/ynar6elcyA4BXgLlAe8I7xUNXfKyIHgKMAr4HDISXVbW+UHkXLXYTMAa4qgqiqqWAYhTsaJQD5wBXrrHWMRYuUqI/cBbw/eNhbIK8CxIrsI9f9Cg8miDvgkQ7sBhoBK5LkHdRJoZhZeIkrHzEwkVKlAD7YkSWDUyT73G8BXx0H6hvgMea4+VdkHhf+D2P7bu/hcJFdtoMUA2nAx9KkHdBohE4ZRa8AVyT8EUuslNdNTy8GQY3w4cSeoAuUqJlJSxqhu8AR4yIl3dBoqIC9j7LGrw9lyY8oJhaiEK3WtCNoLoPeh8oMCdG3kWZWAlMA5Y12OguFi6y00jg3DK4B2u1Y+GCxL+AFa2woJdp2mLhgsRgYDLAKHghQd4FCQEGApcugQMT5F2Q2Ih1OfbCVDexcEGiFFPZNAEFqzhy4IKEYB2/75Cm7XBBAqwTeBFp2g4Xjd0moAq4BMtasXCREnsBrUA18OcEeRck3gBuB24CvlsbL+8iO/XDVOkPAy8tiZd3kRLlmD3sEUx9EwsXJPphncAmYEOCvAsSq7HaCeDnCfIuSAwAZgO/Ac5OkHdBYmPYXiJN7+SidhqEFe4fYOUjFi5I9Abux0ystyTIu8hO7wJXY9mpOkHeBYlyzPT7baxsxCIvCREZKSLPisg8EXldRL4WzleJyJMi8kb4HRTOi4hcGzxsXhORg/O9oxXrelyMdT+iUYDXzFDg4LBfAfwT2B+r0ieH85OBKWH/BMxqJcB4YHa+d4wE/RXodNCnE/ROKa5AfwKOBhYCQ3OILgz7twBn5ty/7b6dbWNA9Tz0HNB7EkhElQkRGQ0chLVNNar6r3CpEagJ+wV52bDDDTfeZWrMWTEfFFAwCRHpDzwEfF1V1+VeU/vLo1wTcl2FegHHAkcBNyYolAvNQqWYr8Y3Ossm7GJ2KgGdBdoOWleM7CQiAtwBzFfVq3MuTQfODfvnYmWl4/w5oZYaDzTnZLtOcWB/GA1IKZyc74M6QwGpMBH7d14D/h62EzDF3dNY7fgUUBXuF+AGzA/wH0B9vndUgV4EqmXoyQkp4cLLpkpEhwKTwvGkSC8bF32nauA+TOdUlefezuCCRCvWctZh+TIWLvpOihWeb4YtFi5ItAMHA4N6wYsJ8i5IDMRS4ux262zFwkWZaAG+BlTy38YmBi5SogXzxNyKOTLGwgWJUZh/Rx/SxhMuSLQC44AfYZ3AWLggsRnr25+I9elj4YJEM9ZBu74GvphgoHBBohz4xknwpeVwdFu8vAsSm4DbppvR8cn94uVdkKjAph38GOBL8fIuSDRi9onpwIVfj5d3QaIS+CxwLXBZgrwLEn2xRm408IUEeRd9p16Yc++5tbBsic3fiZXvceyBqVP+uAQSalgfJEow9X4zpoWIhQsSHf4c84DlCfIuSPTG7HV1mBEyFi5IvI212tcC309ws3FB4iBMQTACGJlgKnJBYgmWCg3YbNtYuCDRglWxi0bBswnyLkiUEGqoRvjBkHh5FySGA48Bn2vF+h6RcEFCgMMxU9Nlc+LlXZBoA/bGbNlNCfIuOoCD+8MB6+G0RHkXKbFlPRwB3EoaERck1mBeZycDp/aNl3dBohnz7/ghcH2CX4SLMrEXVrgPIsPuc6swM9cVWE82Fi5IVGKhLt4CHkyQj/Eo6C0ifxORGeG4VkRmB2+a34fIEIhIWThuCNdH53t2GzYgGsb2sTK6nQRmB5mfczwF+KWq7gOsBS4I5y8A1obzvwz3dYkNWIyCOuCqiA/ahgLdIkZghvcjgBlYT2EVUBKuTwAeD/uPAxPCfkm4T7p6fimoTkaPAdXS4nnZ/Aqb3tAxqXIw0KSqW8JxrifNNi+bcL053L9TDAAe/pkplvcthkJZRE4EVqjqy/GP7/K527xs1mP+fxdjbUUsCmknDgNOEpETsD9rAHANUCkiJeHfzo1X0xHLZqmIlGDG0feM/1X1VqynQY2I9sOmp6VEi4j1OjscmBH2HwDOCPs3A18O+5OAm8P+GcC0fM/dD3RNcFJpKLb73A4k6oC/YkPjB4CycL48HDeE63X5nlsDegnoXNBpxZ57qqozgZlhfzEWVWvHezYBp8Y8txqzUZSTpwbYCVy02IuAy8eaoSWzHsqVwNSFtr9/gryLXuxW4Ftl8GYrfDlB3gWJFmBMq6VISt/JBYn9sIbkIaxL8IdIeRckFmIju5+Q5vDugkQpVsVOHQGDEwItuKid6gA5AL6wFFYn9DtckHgHmDkXfg/wZry8CxJbsLkL3wMaW+PlXZSJEcCVmC42Qe3kg0QzsA6znEYFxQxwkZ2GAJ8us/2UbocLEu8CR7daLZUSfMRNdqrCYnYsTHlAzOCjWFtdGNENCxNBKOacomJh0F6mwjyTDAczXN4Iv8A8C55PkHdBoqYXDDoLjidtKnOPlwdVm44zC3QEqI7IaJnoB4w/xiYqPZnVXqwClz9hit7MjrHXYlqONcCGWszZIwIuUqIKeGcl/AUYntW4HXtWmovQG8BPE+RdkFjeZPaAsaR5FLgg0Yj5iX+gNMk/xUfBHoipaUa3pQ2KXKRENWZgqSbJ391HSqwGDgWmYhPLY+GCRH+sUCf1m3CSndqxuc/9gM8nTMdxkxKTMBLjsjodpxfmpHIk21YCiZbvcczD4p2dE35j4YLEnpiS4Agy7NzbjnmdtQEfvCLhAT09qlNVKgh+HcPQO4ql7RCRShF5UEQWiMh8EZnQnQF5+mPeBG++W9wycQ3wmKrui4V0nY8psp9W1TGYB07HGl3HY54/Y7Bwl3m/axgWavUwbCJtNPIlFdY/W8IO7j50YwSVsaCTQpa6rkjZqRbzR/918Dy7XUT6sYsBeXK9bDZgSrPZpNU0hciUYHEQblLVDofJ7ZZ3U/vLo6KYaM6yV1swZfLpmFdBLAohsRRYqqqzw/GDGKnlIjIUIPyuCNejl71ag81kOZMixQFU1UbgHRHpMAkeiTWy3RaQZwg2oqsAPh7LgMI7gF8F7gsel4uB87E/YJqIXIAZeDrcvB/B9GANmFn6/EI+oi3cnKKL7fGGTtVint0FOjH4PJFFNaZghaYNWyEnFi5IDMIK1QYgYSKLDxJbgeEDzcvmgnw3dwIXJNoAHrLfygR5F8PT3sClR8EnSCPhIiX2GGZOjHNJs566ILHpXfgW1lpnNoJKA9bLbCLDutgKLEjVQKw7EDswcpESwzBV5j5keNWDDViM/WOBwz8WL+8iO7Vh6pp2YMFz8fIuUmIT8BksK6VMx3FBog9wF6bez6sa6QQuSHS4Cn2cNPuECxKtWLXaRIZXjBpXY6GRvkiG50+wysa/F2JBSGLhgsT6reYH+CBpkXtdkGjG4hu3kGHPMwU+OdH6UF0qqHYCFyRKgX1fsGo2s869TcCCUmsjHk74IhckBgN3t1mL/XxWlwxtxuJjvkWGYyhvxIzx+2Pqm1i4IDFusGWlRmyEFwsXJN5ebSSmkOHoc4Mx17k7Ma1HLFyM7NZhkbZKgdsS5F2QaMc8Me8mzXrqIjsNH2AFuokMK89WrbNx9olkOJjhOszgl6I4AyckajAjy2wyvO6pAqccYJrAiQnyLkgIwDizZ89IeUAh1kngUuB1zITwW8ycUIvlgAZs2mifcG9ZOG4I10fne/4hfW2F8lWgc4q0Os5wbF3VelU9ADPsnEE3BuRhMzyDBcG5Ie/NnaCAVOhwOKnCGscZmO632wLyjAStAl0SApDQ3Smhqssw5+G3sUXFm4GX2cWAPDt62RwKnBJ+Y1FIdhqE6bZqsQqkH3Bcwru2Q66XTQ2WX1852bJVLAqpnY4ClqjqSlVtwxaKPYwQkCfc01lAHroKyLMjngJ43gpcLAoh8TYwXkT6hiWwOrxsnsVimsN7vWw6vG8+CzwT/KF2in8DU4fA+FVpzr2FVrGXAQuwKvZerBrttoA8o0EPCmvZ3fe/WJSvGNtQ0CeCl82ErHrZdEQUuhcze8XCxaBoa9i+TYan46zAUuOB/TI8xn5/OdywCVbOt65ALFyQYJOlxKGY+3MsXJCYh9ntXiHDRpZKrEGZSIaHp3tiQ9QGMlwmFJs4ezwZXpSvGbOalpNWsF2Q2IotVHkkpsqMhQsSFdjAfSZmQY2FCxJ9xT5+KjbqioULEqvVph3MxtqKWLgg0Q5MA16cADcmyLuoYldgI6z7Z1n5iIWLlOiP+QGmLI0ITkiUYgP2F4BPZnU9uwGYimQD8PL8PDd3AhckVmJGx7mYy1AsXBTs9ZiZa8qB8MFX4+VdpEQ1cFwZ3P+qzfWMhQsSQ4ArW21a2CUJ8i5ItGDjiGoy7qAyDTgai/QeCxckFOs3zSAtwrsLEhuBmXdYdkrw2TILTk9DRFqw6URDMMvSKFUteFE4FymBTUivB1YFw0vUqnZeSOwSdpPoRty6w28UXBTsXYWXlNgl/F+Q6LGuuIjciflpbcSU4r0xHdpHsImPbwKnqeravA/rKWMj8DHMvakV0xP0wXQGV4frk4Eprg2PqvocZrTfrKqLVXUz5jW0KdxyN/DpQp7V02WiYxW4DvTFhtywfVSWLtHTJHaK4IVQUP3f0yQa2V4RvhHza9wxKkuX6GkSrwFlYVm5Ptg/Xx6u5fqLdIkea7FF5LfY4k3V2MevxbodE4D3EaKyqOqavM/a3e1wgt0kvGA3CS/YTcIL/gPlVDJrTvbjkwAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<Figure size 432x288 with 1 Axes>" | |
] | |
}, | |
"metadata": { | |
"needs_background": "light" | |
}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"print(nmf_model.pu.shape)\n", | |
"print(nmf_model.qi)\n", | |
"\n", | |
"plt.imshow(nmf_model.pu, cmap='hot', interpolation='nearest')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 4. Comments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"1. Those latent spaces are naturally different from each other. I think it does not make a lot of sense to compare them using visualization.\n", | |
"2. We want to read some literature to see if the latent space of matrix decomposition has meanings to users/researchers.\n", | |
"3. Despite Collaborative Filtering methods, another method is content-based recommender system. It concatenates the meta data for each item into a document, then uses document embedding to find similar items. That is probably more closed to Florian's work. We can refer to other packages to explore content-based methods. [List of Recommender Systems](https://github.com/grahamjenson/list_of_recommender_systems)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment