Created
November 25, 2017 20:43
-
-
Save rmitsch/092122035648392b14a5a71c1eec2502 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Exercise 5-2: PCA" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[1, 0, 1, 0],\n", | |
" [0, 0, 0, 0],\n", | |
" [3, 3, 1, 1]])" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import numpy as np\n", | |
"x = np.array( [ (1,0,3), (0,0,3), (1,0,1), (0,0,1) ] ).T\n", | |
"x" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"*Did you find any problem about this dataset? How to solve it?*" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Calculate covariance matrix, eigenvalues and eigenvector." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"y = [ 0. 0. 0. 0.]\n" | |
] | |
} | |
], | |
"source": [ | |
"eigenvalues, normalized_eigenvectors = np.linalg.eigh(np.cov(x, rowvar=True))\n", | |
"W = normalized_eigenvectors[np.argmin(eigenvalues)]\n", | |
"y = W.T.dot(x)\n", | |
"print(\"y = \", y)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Problem**: Transforming back into the original space yields useless result of 0s, because dimension #2 has no variance at all. Therefore using this dimension as principal component yields a useless subspace.\n", | |
"\n", | |
"**Solution**: Prune dimensions not containing any variance before applying PCA." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"x = x[np.var(x, axis=1) > 0]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Apply PCA again." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"y = [ 1. 0. 1. 0.]\n" | |
] | |
} | |
], | |
"source": [ | |
"eigenvalues, normalized_eigenvectors = np.linalg.eigh(np.cov(x, rowvar=True))\n", | |
"W = normalized_eigenvectors[np.argmin(eigenvalues)]\n", | |
"y = W.T.dot(x)\n", | |
"print(\"y = \", y)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Please note that this problem wouldn't have occured if we took the largest eigenvector (as is the correct procedure to my understanding),so I'm not quite sure about what this actually tells about the behaviour of eigenvector selection/calculation in the context of PCA." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 @ /development/datamining", | |
"language": "python", | |
"name": "datamining" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment