sylvchev · February 8, 2024 16:50
diff --git a/house_x.dat b/house_x.dat
   2.1040000e+03   3.0000000e+00
   1.6000000e+03   3.0000000e+00
   2.4000000e+03   3.0000000e+00
   1.4160000e+03   2.0000000e+00
   3.0000000e+03   4.0000000e+00
   1.9850000e+03   4.0000000e+00
   1.5340000e+03   3.0000000e+00
   1.4270000e+03   3.0000000e+00
   1.3800000e+03   3.0000000e+00
   1.4940000e+03   3.0000000e+00
   1.9400000e+03   4.0000000e+00
   2.0000000e+03   3.0000000e+00
   1.8900000e+03   3.0000000e+00
   4.4780000e+03   5.0000000e+00
   1.2680000e+03   3.0000000e+00
   2.3000000e+03   4.0000000e+00
   1.3200000e+03   2.0000000e+00
   1.2360000e+03   3.0000000e+00
   2.6090000e+03   4.0000000e+00
   3.0310000e+03   4.0000000e+00
   1.7670000e+03   3.0000000e+00
   1.8880000e+03   2.0000000e+00
   1.6040000e+03   3.0000000e+00
   1.9620000e+03   4.0000000e+00
   3.8900000e+03   3.0000000e+00
   1.1000000e+03   3.0000000e+00
   1.4580000e+03   3.0000000e+00
   2.5260000e+03   3.0000000e+00
   2.2000000e+03   3.0000000e+00
   2.6370000e+03   3.0000000e+00
   1.8390000e+03   2.0000000e+00
   1.0000000e+03   1.0000000e+00
   2.0400000e+03   4.0000000e+00
   3.1370000e+03   3.0000000e+00
   1.8110000e+03   4.0000000e+00
   1.4370000e+03   3.0000000e+00
   1.2390000e+03   3.0000000e+00
   2.1320000e+03   4.0000000e+00
   4.2150000e+03   4.0000000e+00
   2.1620000e+03   4.0000000e+00
   1.6640000e+03   2.0000000e+00
   2.2380000e+03   3.0000000e+00
   2.5670000e+03   4.0000000e+00
   1.2000000e+03   3.0000000e+00
   8.5200000e+02   2.0000000e+00
   1.8520000e+03   4.0000000e+00
   1.2030000e+03   3.0000000e+00
diff --git a/house_y.dat b/house_y.dat
   3.9990000e+05
   3.2990000e+05
   3.6900000e+05
   2.3200000e+05
   5.3990000e+05
   2.9990000e+05
   3.1490000e+05
   1.9899900e+05
   2.1200000e+05
   2.4250000e+05
   2.3999900e+05
   3.4700000e+05
   3.2999900e+05
   6.9990000e+05
   2.5990000e+05
   4.4990000e+05
   2.9990000e+05
   1.9990000e+05
   4.9999800e+05
   5.9900000e+05
   2.5290000e+05
   2.5500000e+05
   2.4290000e+05
   2.5990000e+05
   5.7390000e+05
   2.4990000e+05
   4.6450000e+05
   4.6900000e+05
   4.7500000e+05
   2.9990000e+05
   3.4990000e+05
   1.6990000e+05
   3.1490000e+05
   5.7990000e+05
   2.8590000e+05
   2.4990000e+05
   2.2990000e+05
   3.4500000e+05
   5.4900000e+05
   2.8700000e+05
   3.6850000e+05
   3.2990000e+05
   3.1400000e+05
   2.9900000e+05
   1.7990000e+05
   2.9990000e+05
   2.3950000e+05
diff --git a/MLA-GD-strategies.ipynb b/MLA-GD-strategies.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "189eef99",
   "metadata": {},
   "source": [
    "# Gradient descent strategies\n",
    "\n",
    "\n",
    "## Objective\n",
    "\n",
    "This applied session aims at implementing different gradient descent strategies, using batch, stochastic or minibatch approaches. We will focus on a simple linear regression problem to evaluate each strategies.\n",
    "\n",
    "## Evaluation\n",
    "\n",
    "You should **individually** complete this notebook with the requested answers and send it by mail when finished. The deadline will be indicated during class, any late submission will be penalized. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ca26efa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# add your import here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54590981",
   "metadata": {},
   "source": [
    "## Question 1 - Hypothesis function and loss\n",
    "\n",
    "\n",
    "Consider a collection of $m$ samples ${x^{(i)}}_{i=1}^m$ in $\\mathcal{X} \\subseteq \\mathbb{R}^n$, where $n$ is the number of features for each samples. $X$ is the matrix concatenating the samples, with $X \\in \\mathbb{R}^{m\\times n}$. We aim to predict a value $y^{(i)}$ in $\\mathcal{Y} \\subseteq \\mathbb{R}$ associated with each sample. $Y$ is the vector concatenating all $y$, with $Y \\in \\mathbb{R}^m$. We restrict ourselves to linear solutions, i.e., our hypothesis is of the form $y^{(i)} = h(x^{(i)})$ with\n",
    "\n",
    "$$h_{\\theta}(x) = \\sum_{i=0}^n \\theta_i x_i = \\theta^T x$$\n",
    "\n",
    "We could define $H: \\mathbb{R}^{m\\times n} \\mapsto \\mathbb{R}^m$ as \n",
    "\n",
    "$$ h_{\\theta}(X) = X\\theta $$\n",
    "\n",
    "Write the hypothesis function `H(theta, X)`, that takes a parameter vector `theta`, the input matrix `X` and returns a vector of $m$ predicted value. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b6a3259",
   "metadata": {},
   "outputs": [],
   "source": [
    "def H(theta, X):\n",
    "    \"\"\"Compute linear regression prediction with given coefficient\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    theta : ndarray of shape (n)\n",
    "        Coefficients of the linear regression\n",
    "    X : ndarray of shape (m, n)\n",
    "        Matrix concatenating m samples of n features\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    pred : ndarray of shape (m)\n",
    "        Predicted values for m samples\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75dd1d13",
   "metadata": {},
   "source": [
    "We will use the MSE as loss : \n",
    "\n",
    "$$\\ell(\\theta) = \\frac{1}{2m}\\sum_{j=1}^{m}(h_{\\theta}(x^{(j)}) - y^{(j)})^2 $$\n",
    "\n",
    "that could be rewrited as \n",
    "\n",
    "$$L(\\theta) = \\frac{1}{2m}(X\\theta - Y)^T(X\\theta - Y)$$\n",
    "\n",
    "Write the loss function `L(theta, X, Y)` that returns the estimated loss as a scalar value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c4dbaa1",
   "metadata": {},
   "outputs": [],
   "source": [
    "def L(theta, X, Y):\n",
    "    \"\"\"Estimate loss as mean square error on linear regression prediction\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    theta : ndarray of shape (n)\n",
    "        Coefficients of the linear regression\n",
    "    X : ndarray of shape (m, n)\n",
    "        Matrix concatenating m samples of n features\n",
    "    Y : ndarray of shape (m)\n",
    "        Vector of m values to predict\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    loss : float\n",
    "        Mean square error computed on a m samples\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ae722af",
   "metadata": {},
   "source": [
    "## Question 2 - Gradient descent strategies\n",
    "\n",
    "To minimize $L(\\theta)$, we will perform a gradient descent, which involves initializing $\\theta$ and then updating it using:\n",
    "\n",
    "$$\\theta_j := \\theta_j - \\alpha \\frac{\\partial L(\\theta)}{\\partial\\theta_j} = \\theta_j - \\alpha (h_\\theta(x)-y)x_j$$\n",
    "\n",
    "It could be rewritten as\n",
    "\n",
    "$$\\theta := \\theta - \\frac{\\alpha}{m} (X\\theta - Y)^T X$$\n",
    "\n",
    "Write the update function for batch strategy, defined as `batch_update(theta, X, Y, alpha)`, with $\\alpha=0.01$ by default. This function should return a vector with the updated values of `theta`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1dfc4dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "def batch_update(theta, X, Y, alpha=0.01):\n",
    "    \"\"\"Compute one iteration of the batch update for linear regression coefficients \n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    theta : ndarray of shape (n)\n",
    "        Coefficients of the linear regression\n",
    "    X : ndarray of shape (m, n)\n",
    "        Matrix concatenating m samples of n features\n",
    "    Y : ndarray of shape (m)\n",
    "        Vector of m values to predict\n",
    "    alpha : float, default=0.01\n",
    "        Gradient step\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    new_theta : ndarray of shape (n)\n",
    "        New coefficients after batch update\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "068d7354",
   "metadata": {},
   "source": [
    "Write the stochastic version of the gradient update, named `stochastic_update(theta, X, Y, alpha)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "addee2cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def stochastic_update(theta, X, Y, alpha=0.01):\n",
    "    \"\"\"Compute one iteration of the stochatiste update for linear regression coefficients\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    theta : ndarray of shape (n)\n",
    "        Coefficients of the linear regression\n",
    "    X : ndarray of shape (m, n)\n",
    "        Matrix concatenating m samples of n features\n",
    "    Y : ndarray of shape (m)\n",
    "        Vector of m values to predict\n",
    "    alpha : float, default=0.01\n",
    "        Gradient step\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    new_theta : ndarray of shape (n)\n",
    "        New coefficients after stochastic update\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdfba44c",
   "metadata": {},
   "source": [
    "It is also possible to directly derive a solution that minimizes the least squares:\n",
    "\n",
    "\\begin{equation}\n",
    "\\begin{split}\n",
    "\\nabla_\\theta L(\\theta) & = \\nabla_\\theta \\frac{1}{2}(X\\theta - Y)^T(X\\theta - Y)\\\n",
    "& = X^T X \\theta - X^T Y\n",
    "\\end{split}\n",
    "\\end{equation}\n",
    "\n",
    "To find the minimum, we look for the point where the derivative is zero:\n",
    "\n",
    "$$ X^T X \\theta = X^T Y$$\n",
    "\n",
    "and then solve the equation to obtain the value of $\\theta$ that minimizes $L(\\theta)$:\n",
    "\n",
    "$$ \\theta = (X^T X)^{-1} X^T Y $$\n",
    "\n",
    "Write the function `normal_equation(X, Y)` that provides directly the value of `theta`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bd7b48c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def normal_equation(X, Y):\n",
    "    \"\"\"Estimate regression coefficient with the Normal equation\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X : ndarray of shape (m, n)\n",
    "        Matrix concatenating m samples of n features\n",
    "    Y : ndarray of shape (m)\n",
    "        Vector of m values to predict\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    theta : ndarray of shape (n)\n",
    "        Coefficients of the linear regression\n",
    "    \"\"\"\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c7d0ac4",
   "metadata": {},
   "source": [
    "## Question 3 - Applications to real estate estimation\n",
    "\n",
    "To apply linear regression, we will first load a dataset containing information about the prices of 47 houses in Portland, Oregon. The data includes two features:\n",
    "\n",
    "- The area of the house in square feet (ft$^2$),\n",
    "- The number of bedrooms.\n",
    "\n",
    "We also have the selling price of each house. Therefore, for this dataset, with $m=47$ and $n=2$, we have 47 samples $x^{(i)}$ that are described in $\\mathbb{R}^2$ (area and number of bedrooms), and we will attempt to predict the price of each house, denoted as $y^{(i)}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aabbf3d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loadtxt('house_x.dat')\n",
    "Y_orig = loadtxt('house_y.dat')\n",
    "n = 2\n",
    "m = data.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f8bf386",
   "metadata": {},
   "source": [
    "Plot the input data "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdc24af9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "2d81d81a",
   "metadata": {},
   "source": [
    "In order to avoid size effect and numerical biais, you need to normalize all the data and prediction (both `X` and `Y`). A standard normalization, substracting the mean and dividing by the standard deviation, is the prefered method here. Store the mean and std value to be able to \"denormalize\" the data before visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba080623",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9310d6d1",
   "metadata": {},
   "source": [
    "Apply batch estimation with $\\alpha = 0.005$ and 800 iterations. Initialize `theta` with zero values. Plot the resulting estimation and if possible the intermediate values. You could use only the square footage information for simplifying the visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0c4bac8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e9559fdb",
   "metadata": {},
   "source": [
    "Do the same for stochastic update"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "784d1c38",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "415b6369",
   "metadata": {},
   "source": [
    "## Question 4 - Convergence rate\n",
    "\n",
    "Compare visually the convergence of the stochastic and the batch strategies and comment the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23ece185",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "2de4777d",
   "metadata": {},
   "source": [
    "Compare the quality of the estimated results obtained by batch and stochastic strategies with the one computed by Normal equation. Comment the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a646c155",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8dab2b64",
   "metadata": {},
   "source": [
    "It is common to conduct tests to find a correct value for the gradient step $\\alpha$. Evaluate different values of $\\alpha$ for batch update with respect to the convergence speed and comment the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "579b63f0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a32b9136",
   "metadata": {},
   "source": [
    "## Question 5 - Application on salary estimation\n",
    "\n",
    "Load a dataset containing information about salaries at Princeton University in the file `salary.raw`. The raw data is organized as follows:\n",
    "\n",
    "- sx = gender, encoded as 1 for females and 0 for males\n",
    "- rk = rank, encoded as 1 for assistant professor, 2 for associate professor, and 3 for full professor\n",
    "- yr = number of years in the current rank\n",
    "- dg = highest degree, 1 for doctorate, 0 for master's degree\n",
    "- yd = number of years since the last degree was obtained\n",
    "- sl = annual academic salary, in US dollars\n",
    "\n",
    "Apply a linear regression to investigate the gender pay gap and comment the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c512008",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8dc15f9a",
   "metadata": {},
   "source": [
    "## Bonus question - Minibatch update\n",
    "\n",
    "Implement `stochastic_update(theta, X, Y, alpha, b)`, with `b` the size of the minibatch. Compare convergence rate with batch and stochastic strategies. Conduct an parameter analysis of the minibatch size and analyze the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35a5f06b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
diff --git a/salary.raw b/salary.raw
       0    3    25    1    35    36350
       0    3    13    1    22    35350
       0    3    10    1    23    28200
       1    3     7    1    27    26775
       0    3    19    0    30    33696
       0    3    16    1    21    28516
       1    3     0    0    32    24900
       0    3    16    1    18    31909
       0    3    13    0    30    31850
       0    3    13    0    31    32850
       0    3    12    1    22    27025
       0    2    15    1    19    24750
       0    3     9    1    17    28200
       0    2     9    0    27    23712
       0    3     9    1    24    25748
       0    3     7    1    15    29342
       0    3    13    1    20    31114
       0    2    11    0    14    24742
       0    2    10    0    15    22906
       0    3     6    0    21    24450
       0    1    16    0    23    19175
       0    2     8    0    31    20525
       0    3     7    1    13    27959
       1    3     8    1    24    38045
       0    2     9    1    12    24832
       0    3     5    1    18    25400
       0    2    11    1    14    24800
       1    3     5    1    16    25500
       0    2     3    0     7    26182
       0    2     3    0    17    23725
       1    1    10    0    15    21600
       0    2    11    0    31    23300
       0    1     9    0    14    23713
       1    2     4    0    33    20690
       1    2     6    0    29    22450
       0    2     1    1     9    20850
       1    1     8    1    14    18304
       0    1     4    1     4    17095
       0    1     4    1     5    16700
       0    1     4    1     4    17600
       0    1     3    1     4    18075
       0    1     3    0    11    18000
       0    2     0    1     7    20999
       1    1     3    1     3    17250
       0    1     2    1     3    16500
       0    1     2    1     1    16094
       1    1     2    1     6    16150
       1    1     2    1     2    15350
       0    1     1    1     1    16244
       1    1     1    1     1    16686
       1    1     1    1     1    15000
       1    1     0    1     2    20300
	2.1040000e+03 3.0000000e+00
	1.6000000e+03 3.0000000e+00
	2.4000000e+03 3.0000000e+00
	1.4160000e+03 2.0000000e+00
	3.0000000e+03 4.0000000e+00
	1.9850000e+03 4.0000000e+00
	1.5340000e+03 3.0000000e+00
	1.4270000e+03 3.0000000e+00
	1.3800000e+03 3.0000000e+00
	1.4940000e+03 3.0000000e+00
	1.9400000e+03 4.0000000e+00
	2.0000000e+03 3.0000000e+00
	1.8900000e+03 3.0000000e+00
	4.4780000e+03 5.0000000e+00
	1.2680000e+03 3.0000000e+00
	2.3000000e+03 4.0000000e+00
	1.3200000e+03 2.0000000e+00
	1.2360000e+03 3.0000000e+00
	2.6090000e+03 4.0000000e+00
	3.0310000e+03 4.0000000e+00
	1.7670000e+03 3.0000000e+00
	1.8880000e+03 2.0000000e+00
	1.6040000e+03 3.0000000e+00
	1.9620000e+03 4.0000000e+00
	3.8900000e+03 3.0000000e+00
	1.1000000e+03 3.0000000e+00
	1.4580000e+03 3.0000000e+00
	2.5260000e+03 3.0000000e+00
	2.2000000e+03 3.0000000e+00
	2.6370000e+03 3.0000000e+00
	1.8390000e+03 2.0000000e+00
	1.0000000e+03 1.0000000e+00
	2.0400000e+03 4.0000000e+00
	3.1370000e+03 3.0000000e+00
	1.8110000e+03 4.0000000e+00
	1.4370000e+03 3.0000000e+00
	1.2390000e+03 3.0000000e+00
	2.1320000e+03 4.0000000e+00
	4.2150000e+03 4.0000000e+00
	2.1620000e+03 4.0000000e+00
	1.6640000e+03 2.0000000e+00
	2.2380000e+03 3.0000000e+00
	2.5670000e+03 4.0000000e+00
	1.2000000e+03 3.0000000e+00
	8.5200000e+02 2.0000000e+00
	1.8520000e+03 4.0000000e+00
	1.2030000e+03 3.0000000e+00
	3.9990000e+05
	3.2990000e+05
	3.6900000e+05
	2.3200000e+05
	5.3990000e+05
	2.9990000e+05
	3.1490000e+05
	1.9899900e+05
	2.1200000e+05
	2.4250000e+05
	2.3999900e+05
	3.4700000e+05
	3.2999900e+05
	6.9990000e+05
	2.5990000e+05
	4.4990000e+05
	2.9990000e+05
	1.9990000e+05
	4.9999800e+05
	5.9900000e+05
	2.5290000e+05
	2.5500000e+05
	2.4290000e+05
	2.5990000e+05
	5.7390000e+05
	2.4990000e+05
	4.6450000e+05
	4.6900000e+05
	4.7500000e+05
	2.9990000e+05
	3.4990000e+05
	1.6990000e+05
	3.1490000e+05
	5.7990000e+05
	2.8590000e+05
	2.4990000e+05
	2.2990000e+05
	3.4500000e+05
	5.4900000e+05
	2.8700000e+05
	3.6850000e+05
	3.2990000e+05
	3.1400000e+05
	2.9900000e+05
	1.7990000e+05
	2.9990000e+05
	2.3950000e+05
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "189eef99",
	"metadata": {},
	"source": [
	"# Gradient descent strategies\n",
	"\n",
	"\n",
	"## Objective\n",
	"\n",
	"This applied session aims at implementing different gradient descent strategies, using batch, stochastic or minibatch approaches. We will focus on a simple linear regression problem to evaluate each strategies.\n",
	"\n",
	"## Evaluation\n",
	"\n",
	"You should individually complete this notebook with the requested answers and send it by mail when finished. The deadline will be indicated during class, any late submission will be penalized. \n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "7ca26efa",
	"metadata": {},
	"outputs": [],
	"source": [
	"# add your import here"
	]
	},
	{
	"cell_type": "markdown",
	"id": "54590981",
	"metadata": {},
	"source": [
	"## Question 1 - Hypothesis function and loss\n",
	"\n",
	"\n",
	"Consider a collection of $m$ samples ${x^{(i)}}_{i=1}^m$ in $\\mathcal{X} \\subseteq \\mathbb{R}^n$, where $n$ is the number of features for each samples. $X$ is the matrix concatenating the samples, with $X \\in \\mathbb{R}^{m\\times n}$. We aim to predict a value $y^{(i)}$ in $\\mathcal{Y} \\subseteq \\mathbb{R}$ associated with each sample. $Y$ is the vector concatenating all $y$, with $Y \\in \\mathbb{R}^m$. We restrict ourselves to linear solutions, i.e., our hypothesis is of the form $y^{(i)} = h(x^{(i)})$ with\n",
	"\n",
	"$$h_{\\theta}(x) = \\sum_{i=0}^n \\theta_i x_i = \\theta^T x$$\n",
	"\n",
	"We could define $H: \\mathbb{R}^{m\\times n} \\mapsto \\mathbb{R}^m$ as \n",
	"\n",
	"$$ h_{\\theta}(X) = X\\theta $$\n",
	"\n",
	"Write the hypothesis function `H(theta, X)`, that takes a parameter vector `theta`, the input matrix `X` and returns a vector of $m$ predicted value. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "3b6a3259",
	"metadata": {},
	"outputs": [],
	"source": [
	"def H(theta, X):\n",
	" \"\"\"Compute linear regression prediction with given coefficient\n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" theta : ndarray of shape (n)\n",
	" Coefficients of the linear regression\n",
	" X : ndarray of shape (m, n)\n",
	" Matrix concatenating m samples of n features\n",
	" \n",
	" Returns\n",
	" -------\n",
	" pred : ndarray of shape (m)\n",
	" Predicted values for m samples\n",
	" \"\"\"\n",
	" pass"
	]
	},
	{
	"cell_type": "markdown",
	"id": "75dd1d13",
	"metadata": {},
	"source": [
	"We will use the MSE as loss : \n",
	"\n",
	"$$\\ell(\\theta) = \\frac{1}{2m}\\sum_{j=1}^{m}(h_{\\theta}(x^{(j)}) - y^{(j)})^2 $$\n",
	"\n",
	"that could be rewrited as \n",
	"\n",
	"$$L(\\theta) = \\frac{1}{2m}(X\\theta - Y)^T(X\\theta - Y)$$\n",
	"\n",
	"Write the loss function `L(theta, X, Y)` that returns the estimated loss as a scalar value."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "0c4dbaa1",
	"metadata": {},
	"outputs": [],
	"source": [
	"def L(theta, X, Y):\n",
	" \"\"\"Estimate loss as mean square error on linear regression prediction\n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" theta : ndarray of shape (n)\n",
	" Coefficients of the linear regression\n",
	" X : ndarray of shape (m, n)\n",
	" Matrix concatenating m samples of n features\n",
	" Y : ndarray of shape (m)\n",
	" Vector of m values to predict\n",
	" \n",
	" Returns\n",
	" -------\n",
	" loss : float\n",
	" Mean square error computed on a m samples\n",
	" \"\"\"\n",
	" pass"
	]
	},
	{
	"cell_type": "markdown",
	"id": "0ae722af",
	"metadata": {},
	"source": [
	"## Question 2 - Gradient descent strategies\n",
	"\n",
	"To minimize $L(\\theta)$, we will perform a gradient descent, which involves initializing $\\theta$ and then updating it using:\n",
	"\n",
	"$$\\theta_j := \\theta_j - \\alpha \\frac{\\partial L(\\theta)}{\\partial\\theta_j} = \\theta_j - \\alpha (h_\\theta(x)-y)x_j$$\n",
	"\n",
	"It could be rewritten as\n",
	"\n",
	"$$\\theta := \\theta - \\frac{\\alpha}{m} (X\\theta - Y)^T X$$\n",
	"\n",
	"Write the update function for batch strategy, defined as `batch_update(theta, X, Y, alpha)`, with $\\alpha=0.01$ by default. This function should return a vector with the updated values of `theta`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "a1dfc4dd",
	"metadata": {},
	"outputs": [],
	"source": [
	"def batch_update(theta, X, Y, alpha=0.01):\n",
	" \"\"\"Compute one iteration of the batch update for linear regression coefficients \n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" theta : ndarray of shape (n)\n",
	" Coefficients of the linear regression\n",
	" X : ndarray of shape (m, n)\n",
	" Matrix concatenating m samples of n features\n",
	" Y : ndarray of shape (m)\n",
	" Vector of m values to predict\n",
	" alpha : float, default=0.01\n",
	" Gradient step\n",
	" \n",
	" Returns\n",
	" -------\n",
	" new_theta : ndarray of shape (n)\n",
	" New coefficients after batch update\n",
	" \"\"\"\n",
	" pass"
	]
	},
	{
	"cell_type": "markdown",
	"id": "068d7354",
	"metadata": {},
	"source": [
	"Write the stochastic version of the gradient update, named `stochastic_update(theta, X, Y, alpha)`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "addee2cf",
	"metadata": {},
	"outputs": [],
	"source": [
	"def stochastic_update(theta, X, Y, alpha=0.01):\n",
	" \"\"\"Compute one iteration of the stochatiste update for linear regression coefficients\n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" theta : ndarray of shape (n)\n",
	" Coefficients of the linear regression\n",
	" X : ndarray of shape (m, n)\n",
	" Matrix concatenating m samples of n features\n",
	" Y : ndarray of shape (m)\n",
	" Vector of m values to predict\n",
	" alpha : float, default=0.01\n",
	" Gradient step\n",
	" \n",
	" Returns\n",
	" -------\n",
	" new_theta : ndarray of shape (n)\n",
	" New coefficients after stochastic update\n",
	" \"\"\"\n",
	" pass"
	]
	},
	{
	"cell_type": "markdown",
	"id": "fdfba44c",
	"metadata": {},
	"source": [
	"It is also possible to directly derive a solution that minimizes the least squares:\n",
	"\n",
	"\\begin{equation}\n",
	"\\begin{split}\n",
	"\\nabla_\\theta L(\\theta) & = \\nabla_\\theta \\frac{1}{2}(X\\theta - Y)^T(X\\theta - Y)\\\n",
	"& = X^T X \\theta - X^T Y\n",
	"\\end{split}\n",
	"\\end{equation}\n",
	"\n",
	"To find the minimum, we look for the point where the derivative is zero:\n",
	"\n",
	"$$ X^T X \\theta = X^T Y$$\n",
	"\n",
	"and then solve the equation to obtain the value of $\\theta$ that minimizes $L(\\theta)$:\n",
	"\n",
	"$$ \\theta = (X^T X)^{-1} X^T Y $$\n",
	"\n",
	"Write the function `normal_equation(X, Y)` that provides directly the value of `theta`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "4bd7b48c",
	"metadata": {},
	"outputs": [],
	"source": [
	"def normal_equation(X, Y):\n",
	" \"\"\"Estimate regression coefficient with the Normal equation\n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" X : ndarray of shape (m, n)\n",
	" Matrix concatenating m samples of n features\n",
	" Y : ndarray of shape (m)\n",
	" Vector of m values to predict\n",
	" \n",
	" Returns\n",
	" -------\n",
	" theta : ndarray of shape (n)\n",
	" Coefficients of the linear regression\n",
	" \"\"\"\n",
	" pass"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7c7d0ac4",
	"metadata": {},
	"source": [
	"## Question 3 - Applications to real estate estimation\n",
	"\n",
	"To apply linear regression, we will first load a dataset containing information about the prices of 47 houses in Portland, Oregon. The data includes two features:\n",
	"\n",
	"- The area of the house in square feet (ft$^2$),\n",
	"- The number of bedrooms.\n",
	"\n",
	"We also have the selling price of each house. Therefore, for this dataset, with $m=47$ and $n=2$, we have 47 samples $x^{(i)}$ that are described in $\\mathbb{R}^2$ (area and number of bedrooms), and we will attempt to predict the price of each house, denoted as $y^{(i)}$."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "aabbf3d8",
	"metadata": {},
	"outputs": [],
	"source": [
	"data = loadtxt('house_x.dat')\n",
	"Y_orig = loadtxt('house_y.dat')\n",
	"n = 2\n",
	"m = data.shape[0]"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4f8bf386",
	"metadata": {},
	"source": [
	"Plot the input data "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "fdc24af9",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "2d81d81a",
	"metadata": {},
	"source": [
	"In order to avoid size effect and numerical biais, you need to normalize all the data and prediction (both `X` and `Y`). A standard normalization, substracting the mean and dividing by the standard deviation, is the prefered method here. Store the mean and std value to be able to \"denormalize\" the data before visualization."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ba080623",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "9310d6d1",
	"metadata": {},
	"source": [
	"Apply batch estimation with $\\alpha = 0.005$ and 800 iterations. Initialize `theta` with zero values. Plot the resulting estimation and if possible the intermediate values. You could use only the square footage information for simplifying the visualization."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d0c4bac8",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "e9559fdb",
	"metadata": {},
	"source": [
	"Do the same for stochastic update"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "784d1c38",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "415b6369",
	"metadata": {},
	"source": [
	"## Question 4 - Convergence rate\n",
	"\n",
	"Compare visually the convergence of the stochastic and the batch strategies and comment the results"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "23ece185",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "2de4777d",
	"metadata": {},
	"source": [
	"Compare the quality of the estimated results obtained by batch and stochastic strategies with the one computed by Normal equation. Comment the results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "a646c155",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "8dab2b64",
	"metadata": {},
	"source": [
	"It is common to conduct tests to find a correct value for the gradient step $\\alpha$. Evaluate different values of $\\alpha$ for batch update with respect to the convergence speed and comment the results. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "579b63f0",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "a32b9136",
	"metadata": {},
	"source": [
	"## Question 5 - Application on salary estimation\n",
	"\n",
	"Load a dataset containing information about salaries at Princeton University in the file `salary.raw`. The raw data is organized as follows:\n",
	"\n",
	"- sx = gender, encoded as 1 for females and 0 for males\n",
	"- rk = rank, encoded as 1 for assistant professor, 2 for associate professor, and 3 for full professor\n",
	"- yr = number of years in the current rank\n",
	"- dg = highest degree, 1 for doctorate, 0 for master's degree\n",
	"- yd = number of years since the last degree was obtained\n",
	"- sl = annual academic salary, in US dollars\n",
	"\n",
	"Apply a linear regression to investigate the gender pay gap and comment the results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1c512008",
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"id": "8dc15f9a",
	"metadata": {},
	"source": [
	"## Bonus question - Minibatch update\n",
	"\n",
	"Implement `stochastic_update(theta, X, Y, alpha, b)`, with `b` the size of the minibatch. Compare convergence rate with batch and stochastic strategies. Conduct an parameter analysis of the minibatch size and analyze the results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "35a5f06b",
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.15"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}
	0 3 25 1 35 36350
	0 3 13 1 22 35350
	0 3 10 1 23 28200
	1 3 7 1 27 26775
	0 3 19 0 30 33696
	0 3 16 1 21 28516
	1 3 0 0 32 24900
	0 3 16 1 18 31909
	0 3 13 0 30 31850
	0 3 13 0 31 32850
	0 3 12 1 22 27025
	0 2 15 1 19 24750
	0 3 9 1 17 28200
	0 2 9 0 27 23712
	0 3 9 1 24 25748
	0 3 7 1 15 29342
	0 3 13 1 20 31114
	0 2 11 0 14 24742
	0 2 10 0 15 22906
	0 3 6 0 21 24450
	0 1 16 0 23 19175
	0 2 8 0 31 20525
	0 3 7 1 13 27959
	1 3 8 1 24 38045
	0 2 9 1 12 24832
	0 3 5 1 18 25400
	0 2 11 1 14 24800
	1 3 5 1 16 25500
	0 2 3 0 7 26182
	0 2 3 0 17 23725
	1 1 10 0 15 21600
	0 2 11 0 31 23300
	0 1 9 0 14 23713
	1 2 4 0 33 20690
	1 2 6 0 29 22450
	0 2 1 1 9 20850
	1 1 8 1 14 18304
	0 1 4 1 4 17095
	0 1 4 1 5 16700
	0 1 4 1 4 17600
	0 1 3 1 4 18075
	0 1 3 0 11 18000
	0 2 0 1 7 20999
	1 1 3 1 3 17250
	0 1 2 1 3 16500
	0 1 2 1 1 16094
	1 1 2 1 6 16150
	1 1 2 1 2 15350
	0 1 1 1 1 16244
	1 1 1 1 1 16686
	1 1 1 1 1 15000
	1 1 0 1 2 20300