Skip to content

Instantly share code, notes, and snippets.

@brendano
Created May 27, 2015 04:35
Show Gist options
  • Save brendano/92879043875fef2a4498 to your computer and use it in GitHub Desktop.
Save brendano/92879043875fef2a4498 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Low-rank, low-dimensional approximations\n",
"\n",
"[brendan o'connor](http://brenocon.com) 2015-05-27"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import pylab as plt\n",
"import numpy as np\n",
"def myscatter(tallmat, lim=(0,1)):\n",
" plt.scatter(tallmat[:,0], tallmat[:,1])\n",
" plt.xlim(*lim)\n",
" plt.ylim(*lim)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Low-rank data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider a bunch of points points in a 2-d space."
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.50731229, 0.98940349],\n",
" [ 0.35645383, 0.48865998],\n",
" [ 0.44328461, 0.13292925],\n",
" [ 0.66188965, 0.7843585 ],\n",
" [ 0.68236366, 0.51902601],\n",
" [ 0.82715837, 0.66963861],\n",
" [ 0.04385451, 0.15970087],\n",
" [ 0.16693213, 0.72208524],\n",
" [ 0.37695706, 0.7386631 ],\n",
" [ 0.5822557 , 0.74324943],\n",
" [ 0.96641731, 0.76116449],\n",
" [ 0.05236542, 0.26385543],\n",
" [ 0.65394879, 0.63273369],\n",
" [ 0.13717814, 0.01411876],\n",
" [ 0.73537329, 0.10750858],\n",
" [ 0.99365108, 0.52159687],\n",
" [ 0.59678456, 0.56492249],\n",
" [ 0.63235431, 0.30108029],\n",
" [ 0.0237222 , 0.39889514],\n",
" [ 0.95385212, 0.38675379]])"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = np.random.random( (20,2))\n",
"X"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEJlJREFUeJzt3V2onVedx/Hvb9IqESZT00Av0kgdLY4OWFRsfengcdqa\nWBhLHZgSX6a+gGWgdiAB63hhc6ODF8lIEUrp1NIrc+EL0xlq04IGRWq1MG11TEqjFpJUim3UivQi\nwf9c7J1k9yTn7Jezz35Z+/uBDefZe+XJPyvn/PJkPWs9K1WFJKktfzHtAiRJ42e4S1KDDHdJapDh\nLkkNMtwlqUGGuyQ1qG+4J/l6kueT/GyVNncmeSbJk0neNt4SJUnDGuTK/T5gx0ofJrkeeGNVXQ58\nBrhrTLVJkkbUN9yr6ofA71Zp8iHg/m7bx4CLklwynvIkSaMYx5j7VuBoz/Ex4NIxnFeSNKJx3VDN\nsmOfaSBJU3TBGM5xHNjWc3xp971XSGLgS9IIqmr5BXRf4wj3B4Bbgf1J3gX8vqqeP1/DUQpsUZI9\nVbVn2nXMglnoi+Tih2HfdXBz9537gV2PVL34gcnWMf2+mBX2xVmjXhj3Dfck3wDeB2xJchS4A7gQ\noKrurqoHk1yf5AjwJ+CToxQiSRqfvuFeVTsHaHPreMqRpuHEXrjtamBj5/i2l+GlvVMtSVqjcQzL\naHgHp13ADDk47QKq6kCSG2HX7s47L+2tqgNTKOXgFH7PWXVw2gXMu0xqs44k5Zi7JA1n1Oz02TKS\n1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkN\nMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDD\nXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGtQ33JPsSHI4yTNJbj/P51uS\nPJTkiSQ/T/KJdalUkjSwVNXKHyYbgKeBa4HjwE+BnVV1qKfNHuDVVfVvSbZ0219SVaeWnauqKuP/\nI0hSu0bNzn5X7lcCR6rq2ao6CewHbljW5jfApu7Xm4AXlwe7JGmyLujz+VbgaM/xMeCqZW3uAb6X\n5DngL4F/Gl95kqRR9Av3lcdszvoC8ERVLSV5A/BIkiuq6o/LG3aHcE47WFUHB65UkhZAkiVgaa3n\n6Rfux4FtPcfb6Fy993oP8CWAqvplkl8DbwIeX36yqtozcqWSBpZkO2ze3Tk6sbeqDky3Ig2qe9F7\n8PRxkjtGOU+/MffHgcuTXJbkVcBNwAPL2hymc8OVJJfQCfZfjVKMpLXrBPum78C+6zqvTd/pvKdF\nsuqVe1WdSnIrcADYANxbVYeS3NL9/G7gy8B9SZ6k84/F56rqxDrXLWlFm3fDvo1w8+k3NsKu3XR+\njrUg+g3LUFXfBb677L27e75+AfiH8ZcmSRpV33CXNG9O7IXbrgY2do5vexle2jvVkjRxqy5iGutv\n5CImaWK8oTqbRvl7GTU7DXdJmoCzN7rv7P0f1Y39An7U7HRYRpImYrI3un0qpCQ1yCt3aUIcB190\nk73R7Zi71oVB9kqjjreqLd5Q1VwzyM6VXPxwZ7Xo6fHW+4Fdj1S9+IFp1qXZ5w1VzRBXSErTZrhL\nE+HCIk2WwzJr5NjyuRyWOT+/VzQKx9ynwBBbmUEmjYfhPgXeJJO03tZrD1VJ0hzyhuqaeJNM0mxy\nWGaNHFuWtJ4cc5ekBjnmLkk6w3CXpAYZ7pLUIMNdkhpkuEuaK0m2Jxc/3Hll+7TrmVXOlpE0Nxbx\nkR8+8lfSAvBx0oNyWEaSGuSVu6Q54iM/BuWYu6S5smiP/PDxA5LUIB8/IEk6w3CXpAYZ7lIjXNyj\nXo65Sw1YxMU9i8JFTNJCc3GPXslhGUlqkFfuUhNc3KNX6jvmnmQH8FVgA/CfVfWV87RZAv4DuBB4\noaqWztPGMXdpHS3a4p5FsS6LmJJsAJ4GrgWOAz8FdlbVoZ42FwE/ArZX1bEkW6rqhXEVKEmLbL0W\nMV0JHKmqZ6vqJLAfuGFZm48A36qqYwDnC3ZJ0mT1C/etwNGe42Pd93pdDmxO8v0kjyf5+DgLlCQN\nr98N1UEmwV8IvB24BngN8GiSH1fVM2stTpI0mn7hfhzY1nO8jc7Ve6+jdG6ivgy8nOQHwBXAOeGe\nZE/P4cGqOjhswZLUsu4ElaU1n6fPDdUL6NxQvQZ4DvgJ595Q/Rvga8B24NXAY8BNVfWLZefyhqok\nDWldVqhW1akkt9JZ5bYBuLeqDiW5pfv53VV1OMlDwFPAn4F7lge7JGmyfLaMJM0wn+cuSTrDcJem\nyMf0ar04LCNNiY/p1SB85K80d3xMr9aPwzKS1CCv3KWp8TG9Wj+OuUtT5GN61c+6PPJ3nAx3SRqe\n89wlSWcY7pLUIMNdkmbIuBa2OeYuLRBv4M62FRa2bXQRk6QVnQ2OfaeD4+okroidKedb2PaJkc5k\nuEsLwxWxi8Rwl6SZcb6Fbae/Ho5j7tKC8EFl82H5fRHgIRcxSVqVN1TnjytUJalBrlDVxLnRhDS7\nvHLXSBy/lSbDzTo0YU6rk2aZwzKS1CCv3DUiN5qQZplj7hqZ0+qk9edUSElqkFMhJUlnGO6S1CDD\nXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGtQ33JPsSHI4yTNJbl+l3TuT\nnEry4fGWKEka1qrhnmQD8DVgB/AWYGeSN6/Q7ivAQ4APB5OkKet35X4lcKSqnq2qk8B+4IbztPss\n8E3gt8MW4D6ckjR+/Tbr2Aoc7Tk+BlzV2yDJVjqB//fAO4GBnyF8dh/Ofac3fLg6iftwStIa9bty\nHySovwp8vjoPhg9DDcts3t3ZYPlmOq87N57d/EGS5te0RyX6XbkfB7b1HG+jc/Xe6x3A/iQAW4AP\nJjlZVQ8sP1mSPT2HB2HzsPVK0sxby6hEkiVgac01rLYTU5ILgKeBa4DngJ8AO6vq0Art7wP+u6q+\nfZ7PztlN5GwH3Nm7D6fDMpLmWnLxw7Dvus6IBMD9wK5Hql78wPDnGm0nplWv3KvqVJJbgQPABuDe\nqjqU5Jbu53cP+xsuO/+BJDfCru5QzEvuwylJY+AeqpI0ZuMclXCDbEmaIZ2APz1B5MTIoxKGuyQ1\naNTs9NkyktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3\nSWqQ4S5JDTLcpTky7X05NT985K80J9yWcjGtyzZ7kmbJ5t2dDZdP78vJxu4WlYa7zuGwjCQ1yCt3\naW6c2Au3XQ30DsvsnWpJmlmOuUtzZFz7cmp+uIeqJDXIPVQlSWfMRbg7t1eShjPzwzLO7ZW0yBqe\n5+7cXkka1lwMy0iShjMHV+7O7ZWkYc38mHv31zq3V9JCcp67JDXIee6SpDMMd0lqkOEuSQ0y3CWp\nQYa7JDXIcJekBg0U7kl2JDmc5Jkkt5/n848meTLJU0l+lOSt4y9VkjSovvPck2wAngauBY4DPwV2\nVtWhnjbvBn5RVX9IsgPYU1XvWnYe57lL0pDWc577lcCRqnq2qk4C+4EbehtU1aNV9Yfu4WPApcMW\nIkkan0HCfStwtOf4WPe9lXwaeHAtRUmS1maQB4cN/HyCJO8HPgW8d4XP9/QcHqyqg4OeW5IWQZIl\nYGmt5xkk3I8D23qOt9G5el9e0FuBe4AdVfW7852oqvaMUKMkLYzuRe/B08dJ7hjlPIMMyzwOXJ7k\nsiSvAm4CHuhtkOR1wLeBj1XVkVEKkSSNT98r96o6leRWOjsfbQDurapDSW7pfn438EXgtcBdSQBO\nVtWV61e2JGk1PvJXkmaYj/yVJJ0x0+GeZHty8cOdV7ZPux5JmhczOyzTCfNN34E7e/dOvdEt9iQt\nklGHZWZ4g+zNu2HfRrj59BsbYdduOjd2JUmrmOlhGUnSaGb4yv3EXrjtaqB3WGbvVEuSpDkxs2Pu\n3V+zvTM8A3Bir+PtkhbNqGPuMx3u0nL+g69FY7irec6g0iJqcLaMtJwzqKRBOVtGkhrklbvmiDOo\npEE55q654g1VLRpvqEqaGP+RnRzDXdJEOGtpspwtI2lCnLU0D5wtI0kN8spd0pCctTQPHHOXNDRv\nqE6ON1QlqUHuoSpJOsNwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnu\nktQgw12SGmS4S1KDDHdJapDhvoIk25OLH+68sn3a9UjSMPqGe5IdSQ4neSbJ7Su0ubP7+ZNJ3jb+\nMifr7AbA+67rvDZ9x4CXNE9WDfckG4CvATuAtwA7k7x5WZvrgTdW1eXAZ4C71qnWCdq8u7Oz+810\nXnduPLvrzNolWRrXueadfXGWfXGWfbF2/a7crwSOVNWzVXUS2A/csKzNh4D7AarqMeCiJJeMvdK2\nLE27gBmyNO0CZsjStAuYIUvTLmDe9dsgeytwtOf4GHDVAG0uBZ5fc3VT4wbAkuZbv3AfdIPV5fv7\nTWZj1nVSVQeS3Ai7ukMxL7kBsKS50i/cjwPbeo630bkyX63Npd33zpFkzkL/xOkvrkvGu7d3kjvG\nesI5Zl+cZV+cZV+sTb9wfxy4PMllwHPATcDOZW0eAG4F9id5F/D7qjpnSGaU3bslSaNZNdyr6lSS\nW4EDwAbg3qo6lOSW7ud3V9WDSa5PcgT4E/DJda9akrSqVM3ZSIkkqa+xr1BdxEVPK+nXF0k+2u2D\np5L8KMlbp1HnJAzyfdFt984kp5J8eJL1TcqAPx9LSf43yc+THJxwiRMzwM/HliQPJXmi2xefmEKZ\nE5Hk60meT/KzVdoMl5tVNbYXnaGbI8BlwIXAE8Cbl7W5Hniw+/VVwI/HWcOsvAbsi3cDf9X9esci\n90VPu+8B/wP847TrntL3xEXA/wGXdo+3TLvuKfbFHuDfT/cD8CJwwbRrX6f++DvgbcDPVvh86Nwc\n95W7i57O6tsXVfVoVf2he/gYnZlGLRrk+wLgs8A3gd9OsrgJGqQfPgJ8q6qOAVTVCxOucVIG6Yvf\nAJu6X28CXqyqUxOscWKq6ofA71ZpMnRujjvcz7egaesAbVoMtUH6otengQfXtaLp6dsXSbbS+eE+\n/fiKFm8GDfI9cTmwOcn3kzye5OMTq26yBumLe4C/TfIc8CTwrxOqbRYNnZv9pkIOayEXPa1g4D9T\nkvcDnwLeu37lTNUgffFV4PNVVeksKmhx6uwg/XAh8HbgGuA1wKNJflxVz6xrZZM3SF98AXiiqpaS\nvAF4JMkVVfXHda5tVg2Vm+MO97Eueppzg/QF3Zuo9wA7qmq1/5bNs0H64h101kpAZ3z1g0lOVtUD\nkylxIgbph6PAC1X1MvBykh8AVwCthfsgffEe4EsAVfXLJL8G3kRn/c2iGTo3xz0sc2bRU5JX0Vn0\ntPyH8wHgnwFWW/TUgL59keR1wLeBj1XVkSnUOCl9+6Kq/rqqXl9Vr6cz7v4vjQU7DPbz8V/A1Uk2\nJHkNnZtnv5hwnZMwSF8cBq4F6I4vvwn41USrnB1D5+ZYr9zLRU9nDNIXwBeB1wJ3da9YT1bVldOq\neb0M2BfNG/Dn43CSh4CngD8D91RVc+E+4PfEl4H7kjxJ50L0c1V1YsWTzrEk3wDeB2xJchS4g84Q\n3ci56SImSWqQ2+xJUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGvT/npqCqzCHF5oA\nAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10ee2c310>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"myscatter(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK now consider points that are nominally in a 2-d space, but in actuality they all lie on a line within that space. The line is a 1-d subspace within the 2-d space."
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEgJJREFUeJzt3XusVWeZx/HvA7Tk1AyDQNJJKF5GO45OQsca0dGOHqet\nYP+Q1EnGoHXqLdaZINMpjdUmoyQTmZgMpCFNKtPUpn/JH9oGZoKtJnqiMW21oy1eoAG1BqhpLNjB\nKCEQnvljLzjbzTlnX87at/d8P8kOZ629unlYOed33r5rvc+KzESSVJZFwy5AklQ/w12SCmS4S1KB\nDHdJKpDhLkkFMtwlqUBtwz0ivhwRL0TEj+c4ZldEHI6IZyLijfWWKEnqVicj9weBDbO9GRE3Aa/N\nzKuBTwD31VSbJKlHbcM9M78L/HaOQ94LPFQd+ySwPCKurKc8SVIv6phzXw0cbdo+BlxVw+dKknpU\n1wXVaNm2p4EkDdGSGj7jOLCmafuqat8fiQgDX5J6kJmtA+i26gj3fcBmYE9EvBV4KTNfmOnAXgos\nUURsy8xtw65jFAz6XETEelj2COyaaOzZchpO3ZyZjw2qhtn4fTHNczGt14Fx23CPiK8A7wRWRcRR\n4PPAZQCZuTsz90fETRFxBPg98JFeCpEGY8VW2DkBt17YMQF3bAWGHu5SndqGe2Zu6uCYzfWUI0mq\nQx3TMure1LALGCFTg/3rTu6ALdcBzdMyOwZbw6ymhl3ACJkadgHjLgb1sI6ISOfcNQoa8+4rtja2\nTu4Yhfl2aTa9ZqfhLkkjrNfstHGYJBXIcJekAhnuklQgw12SCmS4S1KBDHcVJSLWR6z8RuMV64dd\njzQs3gqpYoxy3xipV71mpytUVRD7xkgXOC0jSQVy5K6CjHTfGGmgnHPXWGvtE9P4074xKoe9ZbTg\neAFVC4EXVLUAeQFVmo0XVCWpQI7cNca8gCrNxjl3jTUfvKHSeUFVkgrkwzokSRcZ7pJUIMNdkgpk\nuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe4aKT7gWqqH7Qc0MuzPLl3Kfu4qgP3Zpbo4LSNJ\nBXLkrhFif3apLs65a6TYn136Y/Zzl6QC9a2fe0RsiIhDEXE4Iu6a4f1VEfFoRDwdET+JiA93W4Qk\nqV5zjtwjYjHwLHADcBz4AbApMw82HbMNWJqZn42IVdXxV2bmuZbPcuQuSV3q18h9HXAkM5/LzLPA\nHmBjyzG/BpZVXy8DTrQGuyRpsNrdLbMaONq0fQx4S8sx9wPfiojngT8B/qG+8iRJvWgX7p1cbb0b\neDozJyPiNcA3I+KazPxd64HVFM4FU5k51XGlKoJ3w0hzi4hJYHK+n9Mu3I8Da5q219AYvTd7G/AF\ngMz8eUT8Engd8FTrh2Xmtp4r1dibbi+w88J97NdFhO0FpCbVoHfqwnZEfL6Xz2k35/4UcHVEvCoi\nLgfeD+xrOeYQjQuuRMSVNIL9F70Uo9Kt2NroG3MrjdeuielRvKQ6zTlyz8xzEbGZRm+PxcADmXkw\nIm6r3t8NbAcejIhnaPyy+HRmnuxz3ZKkObiISQNj10epe65Q1VjwgqrUHcNdkgrUt/YDkqTxY7ir\ndj4qTxo+p2VUKy+aSvXyMXsaET4qTxoFTstIUoEcuatmPipPGgXOuasWLfevT8GKyepr72WX5sH7\n3DU0XkSV+scLqhoiL6JKo8YLqpJUIEfuqoEXUaVR45y7amFDMKk/vKCqgTHIpcEx3DUQ3hkjDZZd\nITUgL9sOb5hoPG3xz/BRedJo8oKqOlaN2v8aPlntuRW4ZZglSZqF4a4urNgKOxc13c8O3H4eXvLO\nGGnEOC2j+Xra+XZp9DhyVxdmvJ/97qGWJGlG3i2jrngbpDRY3gopSQXyVkhJ0kWGuyQVyHCXpAIZ\n7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVKC24R4RGyLiUEQcjoi7Zjlm\nMiJ+FBE/iYip2quUJHVlzq6QEbEYeBa4ATgO/ADYlJkHm45ZDnwPWJ+ZxyJiVWa+OMNn2RVSkrrU\nr66Q64AjmflcZp4F9gAbW475APC1zDwGMFOwS5IGq124rwaONm0fq/Y1uxpYERHfjoinIuJDdRYo\nSepeu8fsdfIkj8uAa4HrgSuAxyPiicw8PN/iJEm9aRfux4E1TdtraIzemx0FXszM08DpiPgOcA1w\nSbhHxLamzanMnOq2YEkqWURMApPz/pw2F1SX0Ligej3wPPB9Lr2g+pfAvcB6YCnwJPD+zPxZy2d5\nQXVIfO6pNL56zc45R+6ZeS4iNgOPAYuBBzLzYETcVr2/OzMPRcSjwAHgPHB/a7BreBrBvmwv7Fza\n2LPlHRGx0YCXyuYDsgsX8fL/hXuuhVurPQ8Bt/8w87dvGmZdkjrjA7I1i0Wv7GyfpJK0u6CqsXfm\nV3DnyuntO6t9kkrmtEzhGnPuV+yFtdWc+4Ez8Afn3KUx0Wt2Gu4LgHfLSOPLcBdgkEulMdx14bbH\nR2DXRGPPltNw6mYDXhpffbnPXeNmxVbYOTF92yMTcMdWGusUJC0g3gopSQVy5F6Ukztgy3VA87TM\njqGWJGkonHMvjBdUpbJ4QVWSCmT7AUnSRYa7JBXIcJekAhnuYyYi1kes/EbjFeuHXY+k0eQF1THi\nClRp4XGF6oLgClRJnXFaRpIK5Mh9rLgCVVJnnHMfM65AlRYWV6hKUoFcoSpJushwl6QCGe6SVCDD\nXZIKZLhLUoEM9xFj7xhJdTDcR0hE3A3L98Nf3AgfuRGWPWLAS+qF97mPiKop2H7YVf3CvQu4BXjw\nm5kn3j3M2iQNj43Dxt6KrbBzUVNTMOBLQ6tG0ngz3EfaofPwkr1jJHXNcB8ZlzQFOw+n/s3eMZJ6\n4Zz7CLEpmKRWNg6TpALZOEySdJHhPgQuVJLUb23DPSI2RMShiDgcEXfNcdybI+JcRLyv3hLLMv2Q\n6503Nl4uVJJUvznvlomIxcC9wA3AceAHEbEvMw/OcNwXgUcB59Xn5EOuJfVfu5H7OuBIZj6XmWeB\nPcDGGY77FPBV4Dc11ydJ6kG7+9xXA0ebto8Bb2k+ICJW0wj8vwPeDAzm9pux5UOuJfVfu3DvJKjv\nAT6TmRkRgdMyc8rMxyLi5moqBjjl/eySatcu3I8Da5q219AYvTd7E7CnkeusAt4TEWczc1/rh0XE\ntqbNqcyc6rbgElRhbqBLukRETAKT8/6cuRYxRcQS4FngeuB54PvAptYLqk3HPwj8d2Y+PMN7LmKS\npC71pStkZp6LiM00RpmLgQcy82BE3Fa9v7unaiVJfWX7AUkaYbYfkCRdZLhLUoEMd0kqkOEuSQUy\n3Gtip0dJo8S7ZWpQdXrcC7uWNvZsOQOnNrryVNJ89eU+d3Vq+Xa4Z2lTp8elcPt2XIUqaUiclqnF\nold2tk+SBsORey3O/AruXDm9fWe1T5KGwzn3GjTm3K/YC2urOfcDZ+APzrlLmrdes9Nwr0kj4FdU\nbXxP2sZXUi0Md0kqkL1lJEkXGe6SVCDDXZIKZLhLUoEM9w7YN0bSuPFumTaqvjGPwK6Jxp4tp+HU\nzd7qKGkQ7C3TNyu2ws6Jpr4xE3DHVuwbI2mEOS0jSQVy5D6L6RWnZ1Y2WvhyoZ3vaTi1Y6jFSVIb\nzrnP4NJ59n8+A5f/FBadsLWApEFyzr1Wl8yzL4U7TmSeePcwq5KkTjnnLkkFcuQ+o5M7YMt1QPPt\nj86zSxobzrnPwha+kkaBLX8lqUC2/JUkXWS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAItqHD3\niUqSFooFs4ipEeZX7IW1VeveA2fgDxtdeSpplPV1EVNEbIiIQxFxOCLumuH9D0bEMxFxICK+FxFr\nuy2k/162HZYsnd5esrSxT5LK0zbcI2IxcC+wAXgDsCkiXt9y2C+Ad2TmWuDfgf+qu9D5i9fC5cAn\nq9fl1T5JKk8nXSHXAUcy8zmAiNgDbAQOXjggMx9vOv5J4Koaa6zJZQH/SVOPduBf7XUjqUidTMus\nBo42bR+r9s3mY8D++RTVH3m4s32SNP46Gbl3fMU1It4FfBR4+yzvb2vanMrMqU4/e/5euhu27GX6\nWahn4NTdg/v7Jam9iJgEJuf9Oe3ulomItwLbMnNDtf1Z4HxmfrHluLXAw8CGzDwyw+cM9G6Zmfqx\n26Nd0rjpWz/3iFgCPAtcDzwPfB/YlJkHm455BfAt4JbMfKLOAntRPeB6L+xqHqV726OksdO3B2Rn\n5rmI2Aw8BiwGHsjMgxFxW/X+buBzwMuB+yIC4Gxmruu2mPos3w73LP3jB1zfvp3Gv0GSitfRM1Qz\n8+vA11v27W76+uPAx+stbT4WvbKzfZJUpkIfkH3mV3DnyuntO6t9krQwFNl+wFYDkkrhA7Iv/fu8\nM0bS2DPcJalAfW0cJkkaL4a7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQV\nyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBRq5cI+I9RErv9F4xfph1yNJ42iknsTUCPNlj8Cu\nicaeLafh1M0+Ik/SQtXrk5iW9KOY3q3YCjsn4NYLOybgjq2A4S5JXRiZaZlqCuZa+BJmuSTNz0iM\n3KenY3ZW0zG30Bi9338aTu0YZm2SNI5GItxnmI4B7jgBpz7ofLskdW9kpmVm8EODXZJ6MyIj95M7\nYMt1QPNdMk7HSFKPRuZWyMa8+4qtja2TOxy1S1Lvt0KOTLhLki7Va3aO8py7JKlHhrskFchwl6QC\nGe6SVCDDXZIK1DbcI2JDRByKiMMRcdcsx+yq3n8mIt44+2fZyleSBmHOcI+IxcC9wAbgDcCmiHh9\nyzE3Aa/NzKuBTwD3zf6JO29svJY9spADPiImh13DqPBcTPNcTPNczF+7kfs64EhmPpeZZ4E9wMaW\nY94LPASQmU8CyyPiypk/7tbqtWtiesHSgjQ57AJGyOSwCxghk8MuYIRMDruAcdcu3FcDR5u2j1X7\n2h1z1fxLkyT1ql1vmU6Xr7aunprlv3uo+tPeMZLUT3O2H4iItwLbMnNDtf1Z4HxmfrHpmC8BU5m5\np9o+BLwzM19o+azB9DmQpML04zF7TwFXR8SrgOeB9wObWo7ZB2wG9lS/DF5qDfZei5Mk9WbOcM/M\ncxGxmcZz7xYDD2TmwYi4rXp/d2buj4ibIuII8HvgI32vWpI0p4F1hZQkDU7tK1TrXPQ07tqdi4j4\nYHUODkTE9yJi7TDqHIROvi+q494cEeci4n2DrG9QOvz5mIyIH0XETyJiasAlDkwHPx+rIuLRiHi6\nOhcfHkKZAxERX46IFyLix3Mc011uZmZtLxpTN0eAVwGXAU8Dr2855iZgf/X1W4An6qxhVF4dnou/\nAf60+nrDQj4XTcd9C/gf4O+HXfeQvieWAz8Frqq2Vw277iGei23Af1w4D8AJYMmwa+/T+fhb4I3A\nj2d5v+vcrHvkXvOip7HW9lxk5uOZ+X/V5pOUuz6gk+8LgE8BXwV+M8jiBqiT8/AB4GuZeQwgM18c\ncI2D0sm5+DWwrPp6GXAiM88NsMaByczvAr+d45Cuc7PucHfR07ROzkWzjwH7+1rR8LQ9FxGxmsYP\n94X2FSVeDOrke+JqYEVEfDsinoqIDw2susHq5FzcD/xVRDwPPAP8y4BqG0Vd52bdD8iuedHTWOv4\n3xQR7wI+Cry9f+UMVSfn4h7gM5mZERFc+j1Sgk7Ow2XAtcD1wBXA4xHxRGYe7mtlg9fJubgbeDoz\nJyPiNcA3I+KazPxdn2sbVV3lZt3hfhxY07S9hsZvmLmOuaraV5pOzgXVRdT7gQ2ZOdf/lo2zTs7F\nm2islYDG/Op7IuJsZu4bTIkD0cl5OAq8mJmngdMR8R3gGqC0cO/kXLwN+AJAZv48In4JvI7G+puF\npuvcrHta5uKip4i4nMaip9Yfzn3AP8LFFbAzLnoqQNtzERGvAB4GbsnMI0OocVDanovM/PPMfHVm\nvprGvPs/FRbs0NnPx17guohYHBFX0Lh49rMB1zkInZyLQ8ANANX88uuAXwy0ytHRdW7WOnJPFz1d\n1Mm5AD4HvBy4rxqxns3MdcOquV86PBfF6/Dn41BEPAocAM4D92dmceHe4ffEduDBiHiGxkD005l5\ncmhF91FEfAV4J7AqIo4Cn6cxRddzbrqISZIK5GP2JKlAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEM\nd0kqkOEuSQX6f0it9BcDy2cFAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10efd12d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A = np.array([1,2])\n",
"Z = np.random.random( (50, 1))\n",
"myscatter(Z*A)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of representing every point as the 2-d position $(x_1, x_2)$, there's a more compact representation: only record its position on the line, $z$, and separately keep track of how to map from the line into the 2-d space. This mapping is called a linear projection.\n",
"\n",
"I actually used this representation above, to generate the points. $Z$ is the vector of $z$ positions for each datapoint. $A$ is the projection matrix. In the world of matrix factorization (aka factor analysis ~ principal components analysis), $x$ are observed datapoints (in high-dim space), and $z$ are their positions in the to-be-learned latent, low-dim space. $A$ are the \"loadings\" matrix: how you map between these spaces.\n",
"\n",
"The projection from low-dim to high-dim space is for one datapoint\n",
"\n",
"$$ x = (A_1 z, A_2 z) $$\n",
"\n",
"More generally, where $x$ is $M$-dimensional and $z$ is $K$-dimensional (above we use $M=2$, $K=1$), each element $x_m$ is\n",
"\n",
"$$ x_m = \\sum_k^K A_{m,k} z_{k} $$\n",
"\n",
"Or more compactly, for the entire dataset, $X = ZA$.\n",
"\n",
"The matrices look like this, with their sizes shown on the sides. I'm intending the $k$ lengths to be visually the same, believe it or not.\n",
"\n",
" X = Z A\n",
" -------- ----- ------\n",
" | | | | | |\n",
" | | | | k| |\n",
" n | | n | | ------\n",
" | | | | m\n",
" | | | |\n",
" -------- -----\n",
" m k\n",
" \n",
"$X$ is a **low-rank** matrix, because it can be exactly represented in terms of multiply between two skinny matrices, of skinniness $k$ for $k < m$. This $k$ is a \"low\" number, thus \"low-rank\". (If you look at a linear algebra text, low-rank has a different but intimately related definition in terms of linear independence.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Low-rank approximations\n",
"\n",
"OK, so real data will never actually be low-rank, where everything exactly lies in a subsapce. Well maybe if you do something silly like use a person's weight as two features in both kg and pounds.\n",
"\n",
"But in many problems, it turns out data points tend to be *approximately* low-rank, meaning they lie *close* to a subspace. Returning to our example, it would look like this."
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEUlJREFUeJzt3X+o3Xd9x/Hny7RKhGU1LXSQRuq0+GNgUbH+6vBK1cT+\nYamDlWhdpzLLoHbSjNUJs4HhRDBBgtCVrHaFMfOHs6wbpbagF6XUapltdaalUQtJKmVNrBEJ0tL3\n/jgnuac3ufece3Lu+fE5zwccuN9zvn55++HeVz/5/DqpKiRJbXnZpAuQJI2e4S5JDTLcJalBhrsk\nNchwl6QGGe6S1KC+4Z7k60meSfKTVe7Zm+TJJI8mectoS5QkrdUgPfc7gO0rfZjkSuB1VXUJ8Gng\n1hHVJkkaUt9wr6rvA79e5ZYPA3d2730IOC/JhaMpT5I0jFGMuW8BDvVcHwYuGsFzJUlDGtWEapZd\ne6aBJE3QOSN4xhFga8/1Rd33XiKJgS9JQ6iq5R3ovkYR7ncDNwD7k7wTeK6qnjnTjcMU2KIku6pq\n16TrmAa2xRLbYoltsWTYjnHfcE/yDeC9wAVJDgG3AOcCVNVtVXVPkiuTHAR+B3ximEKkUUiyDTbv\n7Fwd211V355sRdJk9A33qtoxwD03jKYcaXidYN90F+zZ2HnnxsuTXG3Aax6NYlhGa7c46QKmyOLo\nHrV5ZyfYrzv5xka4aScwK+G+OOkCpsjipAuYdR4/MAFVtTjpGqaFbbHEtlhiW5w9e+5qyLHdcOPl\nwMlhmRNwfPdES5ImJOP6mr0k5WoZrTcnVNWaYbPTcJekKTZsdjrmrrmUZFty/n2dV7ZNuh5p1Oy5\na+4sLZnc2zs275JJTaVhs9MJVc2hmV8yKfXlsIwkNcieu+aQSybVPsfcNZdcMqlZ4VJISWqQSyEl\nSacY7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lq\nkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw11TIcm25Pz7Oq9sm3Q90qzzO1Q1cZ0w33QX7N3YeefG\nE3D8ar+0Who+O89Zj2Kktdm8E/ZshOtOvrERbtoJGO7SkByWkaQG2XPXFDi2G268HOgdltk90ZKk\nGeeYu6ZCZ9x9887O1bHdjrdLHcNmZ99wT7Id+CqwAfiXqvryss8vAP4N+CM6/xL4SlX966gKlKR5\nti7hnmQD8ATwfuAI8CNgR1Ud6LlnF/CKqvr7btA/AVxYVS+MokBJmmfDZme/CdXLgINV9VRVPQ/s\nB65ads+vgE3dnzcBR5cHuzQI17pLo9NvQnULcKjn+jDwjmX37AO+k+Rp4A+APx9deZoXS2vd95yc\nVL08iWvdpSH1C/dBZls/DzxSVQtJXgvcn+TSqvrt8hu7QzgnLVbV4sCVqnGudZcAkiwAC2f7nH7h\nfgTY2nO9lU7vvde7gS8CVNXPk/wSeD3w8PKHVdWuoSuVpDnQ7fQunrxOcsswz+kX7g8DlyS5GHga\nuAbYseyex+lMuD6Q5EI6wf6LYYrRPHOtuzRKgyyF/BBLSyFvr6ovJbkeoKpu666QuQN4NZ0J2i9V\n1b+f4TmultGqXOsunW7d1rmPiuGuQRjw0ksZ7pp5ng4pnc5TIdUAV8xIo+KpkJLUIHvumiKumJFG\nxTF3jcSoJkKdUJVeyglVTYwTodL6cUJVE+REqDRtnFCVpAbZc9cIOBEqTRvH3DUSToRK68MJVUlq\n0Hp9E5N0Rn5rkjTd7LlrzVz6KI2PSyE1Ri59lKadwzKS1CB77hqCSx+laeeYu4bi0kdpPFwKKUkN\ncimkJOkUw12SGmS4S1KDDHdJapDhLkkNMty1Ks+QkWaTSyG1Is+QkSbPs2W0DjxDRppVDsvMMYdc\npHY5LDOnBhlycVhGmjyPH9CaJOffB3s+sDTkcidw0/1VRz/40vs8Q0aaJMfctS66YW6gSzPGcJ9b\nHtsrtcxhmTnmkIs0/Rxzl6QGeeSvJOkUw12SGtQ33JNsT/J4kieT3LzCPQtJfpzkp0kWR16lJGlN\nVh1zT7IBeAJ4P3AE+BGwo6oO9NxzHvAAsK2qDie5oKqePcOzHHOXpDVarzH3y4CDVfVUVT0P7Aeu\nWnbPR4H/qKrDAGcKds0OjySQ2tAv3LcAh3quD3ff63UJsDnJd5M8nOTjoyxQ47N03MCeD3Rem+4y\n4KXZ1G8T0yDrJM8F3gpcAbwSeDDJD6rqybMtTuPmKZBSK/qF+xFga8/1Vjq9916HgGer6gRwIsn3\ngEuB08I9ya6ey8WqWlxrwZLUsiQLwMJZP6fPhOo5dCZUrwCeBn7I6ROqbwC+BmwDXgE8BFxTVT9b\n9iwnVKecp0BK02ddDg6rqheS3EDnn+UbgNur6kCS67uf31ZVjye5F3gMeBHYtzzYNRuq6ttJru4O\nxQDHPZJAmlEePyBJU8zjBxrkskRJw7LnPqUc/5YEfllHg1yWKGl4DstIUoPsuU8tvylJ0vAcc59i\nflOSJL+JSZIa5FJISdIphrskNchwl6QGGe6NWG03qztdpfnjhGoDVtvN6k5Xaba5Q3Wurbab1Z2u\n0jxyWEaSGmTPvQmr7WZ1p6s0jxxzb8Rqu1nd6SrNLneoSlKD3KEqSTrFcJekBhnuktQgw32GuNNU\n0qCcUJ0R7jSV5pMTqjNmpV74yr3zzTs7wX4dndfejUvLGyXppdzENAFLvfA9J3vhlye5uvPz6e/b\nO5e0Vob7RKx43gsrnwPjTlNJgzPcZ0T3hMerl/4jcNydppJW5ITqBKw0Odr52UlTSUs8fmDGrHTe\ni+fASOpluEtSg1wKKUk6xXCXpAYZ7pLUIMNdkhpkuEtSgwz3KePJj5JGwaWQUyTJ52HTP8Le7n90\n3cQkzbt1WwqZZHuSx5M8meTmVe57e5IXknxkrUXo5Oal87rB7smPks7OquGeZAPwNWA78CZgR5I3\nrnDfl4F7AXvnQ9m8E97gMJmkkeh3cNhlwMGqegogyX7gKuDAsvs+A3wTePuoC5wv7wF6/3F044ue\n/ChpGP16iluAQz3Xh7vvnZJkC53Av7X71ngG8ZtzbDfsOwHXAv8MfPZFOP4PjrdLGka/nvsgQf1V\n4HNVVUmCwzJDWTrS947uGPtzHhomaWj9wv0IsLXneiud3nuvtwH7O7nOBcCHkjxfVXcvf1iSXT2X\ni1W1uNaCW9YNcwNdmmNJFoCFs37Oakshk5wDPAFcATwN/BDYUVXLx9xP3n8H8F9V9a0zfDY3SyE9\ntlfSqAybnav23KvqhSQ30OlNbgBur6oDSa7vfn7bUNU2bKXvRzXgJY2Tm5hGLDn/PtjzgaXvQb0T\nuOn+qqMfnGRdkmaT57lPge5xAW/trHaxoy5pcvyC7BE5fTjmWjq9930nXKsuadwM9yEtnzTt/Lxn\n49JwDMBNR+H4xxxvlzRuhvsQzjRpCr8/0wqi/zHYJU2C4T6U03rpG+GzdE5x5GTgOxwjaWIM95F5\n2VF47mq4qTtUc9z17ZImxqWQQ1galtnb20t3LbukkRs2Ow33IbkLVdI4GO6S1CA3MUmSTjHcJalB\nhrskNchwH4Ek25Lz7+u8sm3S9UiSE6pnyWWRktbTupznrkGcabfqTTvxWEhJE+SwjCQ1yJ77WTu2\nu3NwmGfKSJoejrmPgLtVJa0Xd6hKUoPcoSpJOsVwl6QGGe6S1KC5DXd3lUpq2VxOqLqrVNKscIfq\nmrirVFLb5nZYRpJaNqc9d3eVSmrbXI65g7tKJc0Gd6hKUoPcoSpJOsVwl6QGGe6S1CDDXZIaZLhL\nUoMMd0lqkOEuSQ0aKNyTbE/yeJInk9x8hs8/luTRJI8leSDJm0dfqiRpUH03MSXZADwBvB84AvwI\n2FFVB3rueRfws6r6TZLtwK6qeuey57iJSZLWaD03MV0GHKyqp6rqeWA/cFXvDVX1YFX9pnv5EHDR\nWguRJI3OIOG+BTjUc324+95KPgXcczZFSZLOziCnQg58+EyS9wGfBN6zwue7ei4Xq2px0GdL0jxI\nsgAsnO1zBgn3I8DWnuutdHrvywt6M7AP2F5Vvz7Tg6pq1xA1StLc6HZ6F09eJ7llmOcMMizzMHBJ\nkouTvBy4Bri794Ykrwa+BVxbVQeHKUSSNDp9e+5V9UKSG+h8Bd0G4PaqOpDk+u7ntwFfAF4F3JoE\n4Pmqumz9ypYkrWZuz3Nf+rKO358P5wIvO+qXdkiaNn5Zxxp0gn3TXfBXG+FO4CvdT248AcevNuAl\nTQu/rGNNNu+EvRvhl3SC/brua+/Gpa/ek6TZNafhLkltG2QpZIOO7YYbL+8My/xtz/s3noDjuydW\nliSNyFyOuYMTqpJmgxOqktQgJ1QlSacY7pLUoJkK9yTbkvPv67yybdL1SNK0mpkx96WNR3s3dt5x\nw5Gk9g2bnTO0FHLzTtizsbPZCICNcNNOOmfeSJJ6zNSwjCRpMDPUcz+58YjeYRk3HEnSGczMmHv3\nGduWzn5xw5Gk9rmJSZIa5CYmSdIphrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWp\nQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUoL7hnmR7\nkseTPJnk5hXu2dv9/NEkbxl9mZKktVg13JNsAL4GbAfeBOxI8sZl91wJvK6qLgE+Ddy6TrU2I8nC\npGuYFrbFEttiiW1x9vr13C8DDlbVU1X1PLAfuGrZPR8G7gSoqoeA85JcOPJK27Iw6QKmyMKkC5gi\nC5MuYIosTLqAWdcv3LcAh3quD3ff63fPRWdfmiRpWP3CvQZ8Tob830mS1sE5fT4/Amztud5Kp2e+\n2j0Xdd87TRJDvyvJLZOuYVrYFktsiyW2xdnpF+4PA5ckuRh4GrgG2LHsnruBG4D9Sd4JPFdVzyx/\nUFUt791LktbJquFeVS8kuQH4NrABuL2qDiS5vvv5bVV1T5IrkxwEfgd8Yt2rliStKlWOlEhSa0a+\nQ9VNT0v6tUWSj3Xb4LEkDyR58yTqHIdBfi+69709yQtJPjLO+sZlwL+PhSQ/TvLTJItjLnFsBvj7\nuCDJvUke6bbFX06gzLFI8vUkzyT5ySr3rC03q2pkLzpDNweBi4FzgUeANy6750rgnu7P7wB+MMoa\npuU1YFu8C/jD7s/b57kteu77DvDfwJ9Nuu4J/U6cB/wvcFH3+oJJ1z3BttgFfOlkOwBHgXMmXfs6\ntcefAm8BfrLC52vOzVH33N30tKRvW1TVg1X1m+7lQ7S7P2CQ3wuAzwDfBP5vnMWN0SDt8FHgP6rq\nMEBVPTvmGsdlkLb4FbCp+/Mm4GhVvTDGGsemqr4P/HqVW9acm6MOdzc9LRmkLXp9CrhnXSuanL5t\nkWQLnT/uk8dXtDgZNMjvxCXA5iTfTfJwko+PrbrxGqQt9gF/kuRp4FHgb8ZU2zRac272Wwq5Vm56\nWjLw/6ck7wM+Cbxn/cqZqEHa4qvA56qqkoTTf0daMEg7nAu8FbgCeCXwYJIfVNWT61rZ+A3SFp8H\nHqmqhSSvBe5PcmlV/Xada5tWa8rNUYf7SDc9zbhB2oLuJOo+YHtVrfbPslk2SFu8jc5eCeiMr34o\nyfNVdfd4ShyLQdrhEPBsVZ0ATiT5HnAp0Fq4D9IW7wa+CFBVP0/yS+D1dPbfzJs15+aoh2VObXpK\n8nI6m56W/3HeDfwFwGqbnhrQty2SvBr4FnBtVR2cQI3j0rctquqPq+o1VfUaOuPuf91YsMNgfx//\nCVyeZEOSV9KZPPvZmOsch0Ha4nHg/QDd8eXXA78Ya5XTY825OdKee7np6ZRB2gL4AvAq4NZuj/X5\nqrpsUjWvlwHbonkD/n08nuRe4DHgRWBfVTUX7gP+TvwTcEeSR+l0RP+uqo5NrOh1lOQbwHuBC5Ic\nAm6hM0Q3dG66iUmSGuTX7ElSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIa9P/OVTws\nCo2C6QAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10e7d2090>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"Atrue = np.array([1,2])\n",
"Ztrue = np.random.random( (50, 1))\n",
"X = Ztrue*Atrue + np.random.normal(scale=0.05, size=50).reshape((50,1))\n",
"myscatter(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now we're saying there was a \"true\" $A$ and $Z$ the same as before, but some gaussian noise was added before the final data came out. Thus, even if we learned the correct $Z$ and $A$, it's only an approximation of the data $X$. Something like $X \\approx ZA$.\n",
"\n",
"Or more specifically, returning to the data-centric paradigm where we are given $X$ and want to learn $A$ and $Z$, we want ones such that they can reconstruct the original data as well as possible, by minimizing some sort of loss function\n",
"\n",
"$$\\arg\\min_{Z,A} Divergence(X, ZA)$$\n",
"\n",
"where \"Divergence\" is a function that measures the non-similarity of two matrices. A good choice is the Frobenius norm, which just means sum of the squared errors between the corresponding elements of the matrices. Principal component analysis, factor analysis, and singular value decomposition, are all matrix factorization techniques (or techniques used to do them, depending on your perspective) that are based on this. We might expect them to be useful when there really is a true linear subspace (that is, shaped like a hyperplane) that underlies the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Nonlinear low-dimensional approximations\n",
"\n",
"The previous example is a case of a *linear* subspace, where the projection operation between the spaces is a matrix multiply -- known as a *linear* operation (thus the \"linear\" in linear algebra). It makes sense for when all the data points lie on a hyperplane (of lower dimension than the observed dimensions).\n",
"\n",
"But we can think of nonlinear subspaces (manifolds), too. For example, consider a 2-d observed space, but where all the data only lies on a circle."
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEACAYAAAC08h1NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEwZJREFUeJzt3W+MXFd5x/HfLwbc5Y9l1iCHJC6owlFIXyWV3BSssi9w\nYoyUsBIFIrW4EUoAyVqh3UiAaYUlaNpI3VW0QkBKA7UEIlCVBFc4xAvCNOJFShKaRjRusFqjJAWX\nekmXxsjE+OmLe50Zr2d2Z+bOzP1zvh9plbl3ru89DPc+c+b8eY4jQgCAdFxSdgEAAONF4AeAxBD4\nASAxBH4ASAyBHwASQ+AHgMQUDvy2v2D7pO0n1jhm0faPbT9u+5qi1wQADG4YNf4vStrd7U3beyS9\nMSK2S7pN0meHcE0AwIAKB/6IeEjSL9Y45EZJB/NjH5a02fbWotcFAAxmHG38l0t6um37GUlXjOG6\nAIAOxtW561Xb5IkAgJK8ZAzXeFbStrbtK/J9F7DNlwEADCAiVleu1zSOwH9I0j5J99q+TtJzEXGy\n04H9Fh6d2T4QEQfKLkdTjOPztLcckRZ2SXvzPQclzS5FnLp+lNctA/fncA1SaS4c+G1/RdJbJb3G\n9tOSPiHppZIUEXdHxGHbe2wfl/S8pFuKXhNonuV5aWanpIlse+ZX0sp8qUVCYxUO/BFxcw/H7Ct6\nHaDJIuJB29PS7Fy2Z2U+Ih4st1RoqnE09WD8jpZdgIY5Oo6L5IE+hWB/tOwCpM5VWYjFdtDGDwD9\nGSR2kqsHABJD4AeAxBD4E2X7BnvLkezPN5RdHgDjQxt/grJAv+k+abF96OA0o0gwDNn9NZmPTlpm\ndNKIDRI7GdXTcJ0fwsk5aWGiNVlIE/kwQh5QFNKqVCycr1TstE2lomII/A3W7SGUJsstGBqMSkUd\nEPgbrdtDyCxRIGUE/gQxSxSjQ6WiDujcbTA6cVEGOnfHa5DYSeBvOB5CoNkI/ACQGFI2AGgcJhsO\nH4G/YrjJgZZWP9Utu6Qrd0mbD9veX3a56o6mngqhMxa4ULYy2S27pC9JujPfO3NOWtnDc5Fh5m7t\nMfkFuNj3lQX9F5+LS3guiqGpB0CFLc9Lx86VXYqmocZfKUx+Adrlkw3/XJr5pF6sqPJcFEUbf8Uw\n7h64GM9Fd4zjB4DEMI4fALAuAj8AJIbADwCJIfADQGII/ACQGAI/ACSGwA8AiSHwA0BiCPwAkBgC\nPwAkhsAPAIkh8K+B1bAANBFJ2rpgNSwAdUCStqGanMuC/l5lf4sTrbSwAFLStF//LMQCAGto/fpf\nOP/rf6ftWv/6J/DnVi/0IInVsACoiWthE/jV+RtdWpnO/mbzL4MVVv0B0AgEfkndvtEjTl2vGn+r\nAxiG5q2FTeAHgDXkC7436tc/wznF0E0A9cVi671fa780OZttLS9ExB2rO3cJ+gDqgMDf23X2S5v+\nQlrM98xIWvl4RNwx6msDwLAR+Hu6zqtXpLte1erIPSjpw7+M+MWmUV8bAIaNmbs98W/1tg8AmimJ\nUT0Xtt8//3Pp9sta794u6cxPSikYAJSg8YG/w+SsM9Ivz0qfy/+3n35BOr2vvBICwHgVDvy2d0u6\nS9IGSX8bEXeuen9K0jck/Ue+6x8i4lNFr9u7iyZnbZQ+/Jj01Kls8zQjeAAkpVDgt71B0qclvU3S\ns5J+YPtQRDy56tDvRcSNRa41XJecymflAkByitb4d0g6HhEnJMn2vZJukrQ68JeYZ795060BoIii\no3oul/R02/Yz+b52IenNth+3fdj21QWv2ZesGWdlWppdyv6YkQtgvKqWz79ojb+XSQCPSdoWEadt\nv13S/ZKu7HSg7QNtm0cj4mjB8kk6H/xJtgZg/Iadzz/vN50qVKYiE7hsXyfpQETszrc/Junc6g7e\nVf/mPyX9XkQsr9pfqaUXAWAY7C1HpIVdF04anV0aVj9jGRO4HpG03fYbbL9M0nskHVpVqK22nb/e\noezLZvniUwEAxqFQU09EnLW9T1kzygZJ90TEk7Y/kL9/t6R3SfqQ7bOSTkt6b8EyA0CNVG+ASXK5\negBg3EaZ/TfJJG2kUwaQsuQCPwuoAEjdILGz5rl6Oq+VK4ZuAkBXCaZlBoC01bzGX73ecgCoulq2\n8a/q0D0qTU7lr+ncBVArRQeoJNG5S4cugKYYRjxLpHOXDl0ATVFOPKNzFwASU8MaPx26AJqinHhW\nuzb+/Fhm6wJoBDp3ydUDAH0pIy0zAKBmCPwAkBgCPwAkhsAPABU0ygXa6dwFgIrpZ0ZvIjN3AaDp\nRjujl6YeAEgMNX4AqJzRzuiljR8AKqjXGb3M3AWAxDBzFwCwLgI/ACSGwA8AiSHwA0BiCPwAkBgC\nPwAkhsAPADUxrMRtzNwFgBpoJW5bOD+bd6ft6UHOReAHgFrolrhtue8z0dQDAIkhZQMA1EC3HP2S\nvkWuHgBoqE6J20jSBgCJIUkbAGBdBH4ASEylAv8oVpMHAFyoUoFfWtglbbqP4A8Ao1OxwL9X2VCl\n873WAIBhq1jgBwCMWsVSNhzUsFeTBwBcqFLj+KXJpbVWkwcAXIgJXACQmNpP4GI0DwCMXqUCv7Tp\nGwR/ABitigX+xY3S5jvKLgUAVFn7SlyD/PvCgd/2btvHbP/Y9ke6HLOYv/+47WvWOeP2omUCgKZq\npWe+ZZd05a5BzlEo8NveIOnTknZLulrSzbbftOqYPZLeGBHbJd0m6bPdz3i7pBeq0dsMAJU0OSfd\nOiF9SdIHBzpD0Rr/DknHI+JERLwg6V5JN6065kZlA/QVEQ9L2mx7a+fT/VpSHC9YJgBouO9LulNt\nyzD2pWjgv1zS023bz+T71jvmis6nO3tGen5/wTIBQIMtz0vHzhU5Q9HA32uzzOoxpl3+3em/l/QH\ntg/Ynhq8WADQWGek545Kt4X0zoFOUDRlw7OStrVtb1NWo1/rmCvyfReJiD8pWB4AaLSIOGp7o/Ty\nzdJDr5e0pd9zFK3xPyJpu+032H6ZpPdIOrTqmEOS3idJtq+T9FxEnCx4XQBIUmtUz13XSgt9B32p\nYI0/Is7a3ifpQUkbJN0TEU/a/kD+/t0Rcdj2HtvHJT0v6ZYi1wSAtE3OSQsTrY7dP+37DIWzc0bE\nA5IeWLXv7lXb+4peBwAwHBVLywwAWNvyvDSzU9LEoGcgOycA1EzWzn9+pcLlXaRlBoCE1D4tMwBg\n9Aj8AJAYAj8AJIbADwCJIfADQGII/ACQGAI/ACSGwA8ANdG+1m42iWswpGwAgBpoZeVcyFM1zOy0\nPT3IuSoV+Fsrxi/PR8SD5ZYGAKpkdVZOTUizc9Jy32eqVOCXFvIV47NvMoI/AAxfpXL1tFZkPChp\ndini1PVllgkAqqLV1LN4vqnnV9LKtKRv9Zurp2I1fgBAJxHxYNamP5tn5VyZz/f1fa6K1fj/Lt/K\nvslo6gGAtQ2SnbNigX9yKduicxcAelH7wE8+fgDoD/n4AQDrIvADQGII/ACQGAI/ACSGwA8AFTWs\npGwXnZdRPQBQPd1m6q4e6j5I7GTmLgBUUrekbCo8x4mmHgBIDDV+AKik5XlpZqek9qae+WGcmTZ+\nAKiorJ1/Mk/K1jmVDSkbACAxpGwAgBrLhm+++lF7y//Yr3x0mEM4L7gONX4AKF8+fPMb0uLGbM/t\nkk6fkU7ftFa2YoZzAkBtTc5JCxvbhm9K+txG6amhDOFsR1MPACSGGj8AVMLyvDTzh5JWN/UMZQhn\nO9r4AaAisnb+zXdIl7xeOvMT6fn9661GyHBOAEgMwzkBAOsi8ANAYgj8AJAYAj8AJIbADwCJIfAD\nQGII/ACQGAI/ACSGwA8AiSHwA0BiBk7SZntS0lclvV7SCUnvjojnOhx3QtKKpN9IeiEidgx6TQBA\ncUVq/B+VtBQRV0r6Tr7dSUiaiohrCPoAUL4igf9GSQfz1wclvXONY0m+BgAVUSTwb42Ik/nrk5K2\ndjkuJH3b9iO2by1wPQDAEKzZxm97SdKlHd76ePtGRITtbvmd3xIRP7X9WklLto9FxENdrnegbfNo\nRBxdq3wAkBrbU5KmCp1j0Hz8to8pa7v/me3XSfpuRFy1zr/5hKT/i4iLVpQhHz8A9G/c+fgPqbUq\n8F5J93co0Mttvyp//QpJ10t6osA1AQAFFanxT0r6mqTfVttwTtuXSfp8RLzD9u9I+nr+T14i6csR\n8ZddzkeNHwD6xNKLvV3nBmlyLttanl9vPUsAqDIC//rXuEHadJ+0OJHtmfmVtDJN8AdQV4PEzoFn\n7tbT5Jy0MNHqmtCENDsnicAPIBnk6gGAxCRW41+el2Z2Smpv6rloaCkANFlSbfz5dejcBdAYdO4W\nLwNfCgBqhcBf7PqM+AFQO4zqKYQRPwDSwKgeAEgMNf4XMeIHQBpo47+wDHTuAqgVOncBIDHjTsuc\nJNs32FuOZH++oezyAEC/qPH3gSGfAKqGGv/ITc5lQX+vsr/FiVafAIA6SfnXO6N6ACSn9et94fyv\n9522k/n1TuDvC0M+gWZIe8Imgb8PEfGg7en8BpG0wpBPALVD5+6IMCcAqK4mDdRgHH9FNOmmApqq\nKZUzAn9F2FuOSAu7Wu2HByXNLkWcur7McgFoHoZzAgDWRefuSDD6B0B10dQzIuu1HzalfRFAuWjj\nr4kLO3+fkHTPOUn/Ij23ny8AAP2gjb82zqd+uFTSlyS9/xLpqmulzYdt7y+7dACajcBfqr+R9MfK\ngv8HJd11ibTpk6nlDQEwXjT1lKDV1HN13vn7QTH0E8AgaOqpiawdf2VaeuIx6VjZxQGQGGr8Jcva\n9Dd9UlrMv4SZ5Qugd4zqqSmGdiJ1PAODI/ADqB1yWxUzSOxk5i6AkqWdG78MdO4CQGKo8QMoGbmt\nxo02fgClo3N3cHTuAkBimMAFAFgXgR+Ssp/a9pYj9isftV/9aPaanEFoad0j3Bt1R1MP2sZR3zqR\n5Qr66/ydmTPSb34kbTxFu2vaGGtfXYzjx4DOj6M+pCzovzieeqP0uWuzJHIzO23zoCeLsfZNQlMP\n1nGZsod9caI16gJNQhNOeqjxQ61x1LdOSLe37b9d2VoBaKpWE87C+SacLr/sGGvfJLTxQ1L7OOoz\nW6SX5nt//bvSZzZmr/tr02Vcdj3YW45IC7t6WQ+C/0+riTZ+DCx/iDssCD+bP+grPT/ovdciUSed\n7hHUEzV+DF0/tUiUi9E69TfWCVy2/8j2j2z/xva1axy32/Yx2z+2/ZFBr4c00fE4Wq3V4GaXsj+C\nfgoGrvHbvkrSOUl3S5qLiMc6HLNB0r9LepukZyX9QNLNEfFkh2Op8Q+J7amIOFri9YdSixzGeYbR\nLl3259k0fJ7DNdYaf0Qci4in1jlsh6TjEXEiIl6QdK+kmwa9Jno2VebFh1eLnJzLgv5eDTKktK2v\nYVf2t+m+1b8aevxFMdV/2bGGqbILkLpRd+5eLunptu1nJP3+iK+JCqhGR+Dak47ohEaq1gz8tpck\nXdrhrf0R8Y89nL8aPceoqVGPHWc2KtJUeFSP7e+qexv/dZIORMTufPtjks5FxJ0djuVLAgAGUNY4\n/m4XfUTSdttvkPRfkt4j6eZOB9KxCwDjUWQ457TtpyVdJ+mbth/I919m+5uSFBFnJe1T9tP53yR9\ntdOIHgDA+FRmAhcAYDxKyc7J5K/hsj1pe8n2U7aP2N7c5bgTtv/V9g9t//O4y1l1vdxvthfz9x+3\nfc24y1gn632etqds/29+P/7Q9p+VUc6qs/0F2ydtP7HGMX3dl2WlZX5C0rSkf+p2QD7569OSdku6\nWtLNtt80nuLVzkclLUXElZK+k293EpKmIuKaiNgxttLVQC/3m+09kt4YEdsl3Sbps2MvaE308fx+\nL78fr4mIT421kPXxRWWfY0eD3JelBH4mfw3djcoS4ij/7zvXOJZO9M56ud9e/Jwj4mFJm21vHW8x\na6PX55f7cR0R8ZCkX6xxSN/3ZZUXYuk0+evykspSdVsj4mT++qSkbv+nh6Rv237E9q3jKVpt9HK/\ndTrmihGXq656+TxD0pvz5onDtq8eW+mape/7cmQzd5n8NVxrfJ4fb9+IiFhjTsRbIuKntl8racn2\nsbw2gd7vt9U1VO7Tznr5XB6TtC0iTtt+u6T7JV052mI1Vl/35cgCf0TsKniKZyVta9vepuybLElr\nfZ55x8+lEfEz26+T9N9dzvHT/L8/t32fsp/jBP5ML/fb6mOuyPfhYut+nhHxy7bXD9j+jO3JiFge\nUxmbou/7sgpNPetO/rL9MmWTvw6Nr1i1ckitvAN7ldWcLmD75bZflb9+haTrlXWyI9PL/XZI0vuk\nF2elP9fWxIYLrft52t5q2/nrHcqGlxP0+9f3fVnKCly2pyUtSnqNsslfP4yIt9u+TNLnI+IdEXHW\n9vnJXxsk3cPkr67+StLXbL9f0glJ75ayyXTKP09lzURfz5+zl0j6ckQcKae41dPtfrP9gfz9uyPi\nsO09to9Lel7SLSUWudJ6+TwlvUvSh2yflXRa0ntLK3CF2f6KpLdKek0+afYTytdHHfS+ZAIXACSm\nCk09AIAxIvADQGII/ACQGAI/ACSGwA8AiSHwA0BiCPwAkBgCPwAk5v8BNXvaqGBG/RAAAAAASUVO\nRK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10eca8610>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"Z = np.random.random((50,1)) * 2*math.pi\n",
"X = np.hstack( (np.cos(Z), np.sin(Z)) )\n",
"myscatter(X, lim=[-1,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, there's a more compact way to represent the data: instead of $(x_1,x_2)$, instead have just store the angle, and map back to the original space using the sine and cosine functions. This is a *nonlinear* mapping, because sine and cosine are non-linear functions (circular definition, sorry, read something more indepth to nail this down)\n",
"\n",
"I don't think it's right to call this a \"low-rank\" approximation, because \"rank\" is really a term very specific to linear algebra. Thus I guess we can call it a \"low-dimensional latent space approximation\".\n",
"\n",
"When people in machine learning talk about manifold learning or subspace learning, they usually mean learning nonlinear transforms like this, or other ones. (Linear subspace learning is pretty well solved now with many different algorithms and approaches available.)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment