avidale · May 25, 2023 21:16 · Sandy4321 · May 25, 2023
diff --git a/quantreg_gradient.ipynb b/quantreg_gradient.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "quantreg_gradient.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPav8U8imPgfjisuGBIR648",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/avidale/30d8e34884e9a62d08b5ebdfeaac23db/quantreg_gradient.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CmAZblOYddfl"
      },
      "source": [
        "Here is a naive implementation of gradient descent for Quantile regression problem (https://github.com/scikit-learn/scikit-learn/pull/9978). \n",
        "\n",
        "The motivation is to make the implementation memory- and time-efficient for large datasets, because the general linear-programming solution, albeit being rigorous and exact, scales poorly. \n",
        "\n",
        "The main challenge is to find the right learning rate, so we start with very high one, increase it using momentum, and decrease it when if the loss function is unstable. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "L4UvVY9orED1"
      },
      "source": [
        "import numpy as np\n",
        "from sklearn.datasets import make_regression\n",
        "from sklearn.linear_model import Ridge\n",
        "from sklearn.metrics import mean_absolute_error, mean_squared_error"
      ],
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EA0VsHlurLJF"
      },
      "source": [
        "X, y, coef = make_regression(n_samples=60_000, n_features=3_000, n_informative=1_000, coef=True, bias=100, noise=10)"
      ],
      "execution_count": 6,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-AhzUMQmrO6b",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c7734150-f191-4a7a-d147-1248cc0d2dc7"
      },
      "source": [
        "%%time\n",
        "linear = Ridge(1e-5).fit(X, y)"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "CPU times: user 34.5 s, sys: 1.38 s, total: 35.9 s\n",
            "Wall time: 18.8 s\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DdZFVcdilNqM"
      },
      "source": [
        "The error of a Ridge model will be our baseline."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6VKxWvYtsOcM",
        "outputId": "821b0099-7e6e-4733-f167-db772d348785"
      },
      "source": [
        "print(mean_squared_error(linear.coef_, coef))"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0.001725866714478058\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6F4aHtjvvLon"
      },
      "source": [
        "def quantile_gradient(X, y, reg=0, q=0.5, base_lr=0.1, momentum=0.9, momentum_decay=0.2, lr_decay=0.8, max_steps=1000, rel_tol=1e-2, abs_tol=1e-4, patience=10, verbose=True, intercept=True):\n",
        "    lr = base_lr * np.std(y)\n",
        "    prev_loss = np.infty\n",
        "    min_loss = np.infty\n",
        "    patience_steps = 0\n",
        "    n, m = X.shape\n",
        "    beta = np.zeros(m + 1)\n",
        "    cum_grad = beta * 0\n",
        "\n",
        "    for i in range(max_steps):\n",
        "        resid = y - np.dot(X, beta[:m]) - beta[-1] * intercept\n",
        "        coef = beta[:m]\n",
        "        loss = np.mean((resid > 0) * resid * q - (resid < 0) * resid * (1 - q)) \\\n",
        "            + reg * sum((coef > 0) * coef - (coef < 0) * coef)\n",
        "        if verbose:\n",
        "            print(loss)\n",
        "        if loss > prev_loss:\n",
        "            cum_grad *= momentum_decay\n",
        "            lr *= lr_decay\n",
        "        if loss < min_loss * (1-rel_tol) or loss < min_loss - abs_tol:\n",
        "            min_loss = loss\n",
        "            patience_steps = 0\n",
        "        else:\n",
        "            patience_steps += 1\n",
        "        if patience_steps > patience:\n",
        "            if verbose:\n",
        "                print(f'early stopping after {i} steps')\n",
        "            break\n",
        "        prev_loss = loss\n",
        "        dldpred = (resid > 0) * q - (resid < 0) * (1 - q)\n",
        "        grad = np.concatenate([\n",
        "            np.dot(dldpred, X) / len(resid) - reg * ((coef > 0) * 1 - (coef < 0) * 1), \n",
        "            [np.mean(dldpred)]\n",
        "        ])\n",
        "        cum_grad = cum_grad * momentum + grad\n",
        "        delta = cum_grad * lr\n",
        "        if reg:\n",
        "            # small coefficients with small gradient stay small\n",
        "            delta[:m][(coef == 0) & (np.abs(delta[:m]) < reg)] = 0\n",
        "        beta += delta\n",
        "        if reg:\n",
        "            # coefficient that would change sign just stay zero\n",
        "            beta[:m][beta[:m] * coef < 0] = 0\n",
        "    if intercept:\n",
        "        return beta[-1], beta[:m]\n",
        "    return beta"
      ],
      "execution_count": 53,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YpwtQjSOlJxf",
        "outputId": "e8691871-ec2a-44fb-cd29-be6d80533a59"
      },
      "source": [
        "%%time\n",
        "intercept_, coef_ = quantile_gradient(X, y)"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "721.4498481230092\n",
            "690.6059539491163\n",
            "632.4861174767065\n",
            "550.7371819466156\n",
            "449.44699384775305\n",
            "334.29649570849017\n",
            "215.22422650630403\n",
            "122.9014098172564\n",
            "134.17752824130187\n",
            "119.56367284801284\n",
            "82.34356908134315\n",
            "37.59439427733621\n",
            "54.63865105675187\n",
            "40.21363067649704\n",
            "13.794595280859172\n",
            "33.18709739526756\n",
            "20.345223714953192\n",
            "12.676932830011507\n",
            "21.41917506290413\n",
            "9.47364769538664\n",
            "16.248666975848817\n",
            "8.451094810804216\n",
            "11.353466165237702\n",
            "5.67028143822664\n",
            "9.57025678374903\n",
            "5.121668822670301\n",
            "7.714832910106319\n",
            "4.70959409770891\n",
            "6.076627200663249\n",
            "4.252777790945006\n",
            "5.002369227235568\n",
            "4.059700928232004\n",
            "4.344134743811103\n",
            "3.941821030331402\n",
            "4.022687126978004\n",
            "3.883779781504334\n",
            "3.8924598791930918\n",
            "3.848867921537792\n",
            "3.849474560278547\n",
            "3.834222192374651\n",
            "3.8355329497341937\n",
            "3.8287404832852867\n",
            "3.830226735533932\n",
            "3.826345302630014\n",
            "3.8276351189731823\n",
            "3.825129597902225\n",
            "3.8258961994766176\n",
            "3.8243582354258923\n",
            "3.824859485301996\n",
            "3.823825612040234\n",
            "3.824164316216894\n",
            "3.823469866015381\n",
            "3.823701571136547\n",
            "3.823206655850575\n",
            "3.823346856553724\n",
            "3.8229957939367214\n",
            "3.823127331313798\n",
            "3.822859248509898\n",
            "3.8229323786909957\n",
            "3.822747415399307\n",
            "3.8228014847412166\n",
            "3.822658747971786\n",
            "3.8227022837163145\n",
            "3.82259706621374\n",
            "3.822636041674885\n",
            "3.822545267415549\n",
            "3.8225801576623724\n",
            "3.822508745554385\n",
            "3.8225299467479137\n",
            "3.8224761637607547\n",
            "3.822491918027882\n",
            "3.8224513716181803\n",
            "3.8224614706566022\n",
            "3.8224309308387614\n",
            "3.8224390922789646\n",
            "3.8224141383916033\n",
            "3.822421757861059\n",
            "3.822403114488966\n",
            "3.822407732599003\n",
            "3.8223930350146724\n",
            "3.822396500286839\n",
            "early stopping after 80 steps\n",
            "CPU times: user 36.1 s, sys: 1.1 s, total: 37.2 s\n",
            "Wall time: 19.6 s\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zwyjr6Cvhkuq"
      },
      "source": [
        "We see that quantile regression has higher error that linear regression, but it is not higher by orders of magnitude.\n",
        "\n",
        "Abd it is comparable with linear regression: 20 vs 18 seconds. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UxMAHel7sf7A",
        "outputId": "0798db38-bb3f-40a3-d778-31061841d2fc"
      },
      "source": [
        "print(mean_squared_error(coef_, coef))"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0.002725793428993132\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HNffJV9fhv04"
      },
      "source": [
        "Here we test that the algorithm is adequate for various quantiles."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ybd_NTeeiGer"
      },
      "source": [
        "X, y, coef = make_regression(n_samples=6000, n_features=300, n_informative=100, coef=True, noise=10, bias=100)"
      ],
      "execution_count": 54,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NhcVfXPTspaS",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "1238abb8-79ae-46e0-80e8-d0aff37914e8"
      },
      "source": [
        "intercept_, coef_ = quantile_gradient(X, y, q=0.1, verbose=False)\n",
        "print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
        "intercept_, coef_ = quantile_gradient(X, y, q=0.45, verbose=False)\n",
        "print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
        "intercept_, coef_ = quantile_gradient(X, y, q=0.95, verbose=False)\n",
        "print(np.mean(y < np.dot(X, coef_) + intercept_ ))"
      ],
      "execution_count": 55,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0.09933333333333333\n",
            "0.4505\n",
            "0.9501666666666667\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SooYgTc904lU"
      },
      "source": [
        "Check how regularization affects stability.\n",
        "\n",
        "We can see that with a lot of regularization the algorithm seems unstable. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "sHFthSGEitpO",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "857473e2-b46a-4a59-b73a-3fba3d52f885"
      },
      "source": [
        "for reg in [0, 1e-4, 1e-3, 1e-2, 0.1, 0.5, 1, 10, 100]:\n",
        "    intercept_, coef_ = quantile_gradient(X, y, reg=reg, verbose=False)\n",
        "    print(reg, '\\t', sum(np.abs(coef_)), sum(np.abs(coef_) < 1e-3))"
      ],
      "execution_count": 56,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0 \t 5345.732735063019 0\n",
            "0.0001 \t 5344.979756393498 1\n",
            "0.001 \t 5337.481236505218 13\n",
            "0.01 \t 5293.496535220853 159\n",
            "0.1 \t 675.1580890111018 5\n",
            "0.5 \t 1957.0192335098054 120\n",
            "1 \t 671.5446372105087 195\n",
            "10 \t 1049.2034863387717 289\n",
            "100 \t 0.0 300\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2nVL5l1FwEFc"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "quantreg_gradient.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"authorship_tag": "ABX9TyPav8U8imPgfjisuGBIR648",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/avidale/30d8e34884e9a62d08b5ebdfeaac23db/quantreg_gradient.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "CmAZblOYddfl"
	},
	"source": [
	"Here is a naive implementation of gradient descent for Quantile regression problem (https://github.com/scikit-learn/scikit-learn/pull/9978). \n",
	"\n",
	"The motivation is to make the implementation memory- and time-efficient for large datasets, because the general linear-programming solution, albeit being rigorous and exact, scales poorly. \n",
	"\n",
	"The main challenge is to find the right learning rate, so we start with very high one, increase it using momentum, and decrease it when if the loss function is unstable. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "L4UvVY9orED1"
	},
	"source": [
	"import numpy as np\n",
	"from sklearn.datasets import make_regression\n",
	"from sklearn.linear_model import Ridge\n",
	"from sklearn.metrics import mean_absolute_error, mean_squared_error"
	],
	"execution_count": 5,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "EA0VsHlurLJF"
	},
	"source": [
	"X, y, coef = make_regression(n_samples=60_000, n_features=3_000, n_informative=1_000, coef=True, bias=100, noise=10)"
	],
	"execution_count": 6,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "-AhzUMQmrO6b",
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"outputId": "c7734150-f191-4a7a-d147-1248cc0d2dc7"
	},
	"source": [
	"%%time\n",
	"linear = Ridge(1e-5).fit(X, y)"
	],
	"execution_count": 7,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"CPU times: user 34.5 s, sys: 1.38 s, total: 35.9 s\n",
	"Wall time: 18.8 s\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "DdZFVcdilNqM"
	},
	"source": [
	"The error of a Ridge model will be our baseline."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "6VKxWvYtsOcM",
	"outputId": "821b0099-7e6e-4733-f167-db772d348785"
	},
	"source": [
	"print(mean_squared_error(linear.coef_, coef))"
	],
	"execution_count": 8,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0.001725866714478058\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "6F4aHtjvvLon"
	},
	"source": [
	"def quantile_gradient(X, y, reg=0, q=0.5, base_lr=0.1, momentum=0.9, momentum_decay=0.2, lr_decay=0.8, max_steps=1000, rel_tol=1e-2, abs_tol=1e-4, patience=10, verbose=True, intercept=True):\n",
	" lr = base_lr * np.std(y)\n",
	" prev_loss = np.infty\n",
	" min_loss = np.infty\n",
	" patience_steps = 0\n",
	" n, m = X.shape\n",
	" beta = np.zeros(m + 1)\n",
	" cum_grad = beta * 0\n",
	"\n",
	" for i in range(max_steps):\n",
	" resid = y - np.dot(X, beta[:m]) - beta[-1] * intercept\n",
	" coef = beta[:m]\n",
	" loss = np.mean((resid > 0) * resid * q - (resid < 0) * resid * (1 - q)) \\\n",
	" + reg * sum((coef > 0) * coef - (coef < 0) * coef)\n",
	" if verbose:\n",
	" print(loss)\n",
	" if loss > prev_loss:\n",
	" cum_grad *= momentum_decay\n",
	" lr *= lr_decay\n",
	" if loss < min_loss * (1-rel_tol) or loss < min_loss - abs_tol:\n",
	" min_loss = loss\n",
	" patience_steps = 0\n",
	" else:\n",
	" patience_steps += 1\n",
	" if patience_steps > patience:\n",
	" if verbose:\n",
	" print(f'early stopping after {i} steps')\n",
	" break\n",
	" prev_loss = loss\n",
	" dldpred = (resid > 0) * q - (resid < 0) * (1 - q)\n",
	" grad = np.concatenate([\n",
	" np.dot(dldpred, X) / len(resid) - reg * ((coef > 0) * 1 - (coef < 0) * 1), \n",
	" [np.mean(dldpred)]\n",
	" ])\n",
	" cum_grad = cum_grad * momentum + grad\n",
	" delta = cum_grad * lr\n",
	" if reg:\n",
	" # small coefficients with small gradient stay small\n",
	" delta[:m][(coef == 0) & (np.abs(delta[:m]) < reg)] = 0\n",
	" beta += delta\n",
	" if reg:\n",
	" # coefficient that would change sign just stay zero\n",
	" beta[:m][beta[:m] * coef < 0] = 0\n",
	" if intercept:\n",
	" return beta[-1], beta[:m]\n",
	" return beta"
	],
	"execution_count": 53,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "YpwtQjSOlJxf",
	"outputId": "e8691871-ec2a-44fb-cd29-be6d80533a59"
	},
	"source": [
	"%%time\n",
	"intercept_, coef_ = quantile_gradient(X, y)"
	],
	"execution_count": 12,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"721.4498481230092\n",
	"690.6059539491163\n",
	"632.4861174767065\n",
	"550.7371819466156\n",
	"449.44699384775305\n",
	"334.29649570849017\n",
	"215.22422650630403\n",
	"122.9014098172564\n",
	"134.17752824130187\n",
	"119.56367284801284\n",
	"82.34356908134315\n",
	"37.59439427733621\n",
	"54.63865105675187\n",
	"40.21363067649704\n",
	"13.794595280859172\n",
	"33.18709739526756\n",
	"20.345223714953192\n",
	"12.676932830011507\n",
	"21.41917506290413\n",
	"9.47364769538664\n",
	"16.248666975848817\n",
	"8.451094810804216\n",
	"11.353466165237702\n",
	"5.67028143822664\n",
	"9.57025678374903\n",
	"5.121668822670301\n",
	"7.714832910106319\n",
	"4.70959409770891\n",
	"6.076627200663249\n",
	"4.252777790945006\n",
	"5.002369227235568\n",
	"4.059700928232004\n",
	"4.344134743811103\n",
	"3.941821030331402\n",
	"4.022687126978004\n",
	"3.883779781504334\n",
	"3.8924598791930918\n",
	"3.848867921537792\n",
	"3.849474560278547\n",
	"3.834222192374651\n",
	"3.8355329497341937\n",
	"3.8287404832852867\n",
	"3.830226735533932\n",
	"3.826345302630014\n",
	"3.8276351189731823\n",
	"3.825129597902225\n",
	"3.8258961994766176\n",
	"3.8243582354258923\n",
	"3.824859485301996\n",
	"3.823825612040234\n",
	"3.824164316216894\n",
	"3.823469866015381\n",
	"3.823701571136547\n",
	"3.823206655850575\n",
	"3.823346856553724\n",
	"3.8229957939367214\n",
	"3.823127331313798\n",
	"3.822859248509898\n",
	"3.8229323786909957\n",
	"3.822747415399307\n",
	"3.8228014847412166\n",
	"3.822658747971786\n",
	"3.8227022837163145\n",
	"3.82259706621374\n",
	"3.822636041674885\n",
	"3.822545267415549\n",
	"3.8225801576623724\n",
	"3.822508745554385\n",
	"3.8225299467479137\n",
	"3.8224761637607547\n",
	"3.822491918027882\n",
	"3.8224513716181803\n",
	"3.8224614706566022\n",
	"3.8224309308387614\n",
	"3.8224390922789646\n",
	"3.8224141383916033\n",
	"3.822421757861059\n",
	"3.822403114488966\n",
	"3.822407732599003\n",
	"3.8223930350146724\n",
	"3.822396500286839\n",
	"early stopping after 80 steps\n",
	"CPU times: user 36.1 s, sys: 1.1 s, total: 37.2 s\n",
	"Wall time: 19.6 s\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "zwyjr6Cvhkuq"
	},
	"source": [
	"We see that quantile regression has higher error that linear regression, but it is not higher by orders of magnitude.\n",
	"\n",
	"Abd it is comparable with linear regression: 20 vs 18 seconds. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "UxMAHel7sf7A",
	"outputId": "0798db38-bb3f-40a3-d778-31061841d2fc"
	},
	"source": [
	"print(mean_squared_error(coef_, coef))"
	],
	"execution_count": 13,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0.002725793428993132\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "HNffJV9fhv04"
	},
	"source": [
	"Here we test that the algorithm is adequate for various quantiles."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "ybd_NTeeiGer"
	},
	"source": [
	"X, y, coef = make_regression(n_samples=6000, n_features=300, n_informative=100, coef=True, noise=10, bias=100)"
	],
	"execution_count": 54,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "NhcVfXPTspaS",
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"outputId": "1238abb8-79ae-46e0-80e8-d0aff37914e8"
	},
	"source": [
	"intercept_, coef_ = quantile_gradient(X, y, q=0.1, verbose=False)\n",
	"print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
	"intercept_, coef_ = quantile_gradient(X, y, q=0.45, verbose=False)\n",
	"print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
	"intercept_, coef_ = quantile_gradient(X, y, q=0.95, verbose=False)\n",
	"print(np.mean(y < np.dot(X, coef_) + intercept_ ))"
	],
	"execution_count": 55,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0.09933333333333333\n",
	"0.4505\n",
	"0.9501666666666667\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "SooYgTc904lU"
	},
	"source": [
	"Check how regularization affects stability.\n",
	"\n",
	"We can see that with a lot of regularization the algorithm seems unstable. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "sHFthSGEitpO",
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"outputId": "857473e2-b46a-4a59-b73a-3fba3d52f885"
	},
	"source": [
	"for reg in [0, 1e-4, 1e-3, 1e-2, 0.1, 0.5, 1, 10, 100]:\n",
	" intercept_, coef_ = quantile_gradient(X, y, reg=reg, verbose=False)\n",
	" print(reg, '\\t', sum(np.abs(coef_)), sum(np.abs(coef_) < 1e-3))"
	],
	"execution_count": 56,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0 \t 5345.732735063019 0\n",
	"0.0001 \t 5344.979756393498 1\n",
	"0.001 \t 5337.481236505218 13\n",
	"0.01 \t 5293.496535220853 159\n",
	"0.1 \t 675.1580890111018 5\n",
	"0.5 \t 1957.0192335098054 120\n",
	"1 \t 671.5446372105087 195\n",
	"10 \t 1049.2034863387717 289\n",
	"100 \t 0.0 300\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "2nVL5l1FwEFc"
	},
	"source": [
	""
	],
	"execution_count": null,
	"outputs": []
	}
	]
	}