Skip to content

Instantly share code, notes, and snippets.

@avidale
Last active May 25, 2023 21:16
Show Gist options
  • Save avidale/30d8e34884e9a62d08b5ebdfeaac23db to your computer and use it in GitHub Desktop.
Save avidale/30d8e34884e9a62d08b5ebdfeaac23db to your computer and use it in GitHub Desktop.
quantreg_gradient.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "quantreg_gradient.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPav8U8imPgfjisuGBIR648",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/avidale/30d8e34884e9a62d08b5ebdfeaac23db/quantreg_gradient.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CmAZblOYddfl"
},
"source": [
"Here is a naive implementation of gradient descent for Quantile regression problem (https://github.com/scikit-learn/scikit-learn/pull/9978). \n",
"\n",
"The motivation is to make the implementation memory- and time-efficient for large datasets, because the general linear-programming solution, albeit being rigorous and exact, scales poorly. \n",
"\n",
"The main challenge is to find the right learning rate, so we start with very high one, increase it using momentum, and decrease it when if the loss function is unstable. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "L4UvVY9orED1"
},
"source": [
"import numpy as np\n",
"from sklearn.datasets import make_regression\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_absolute_error, mean_squared_error"
],
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "EA0VsHlurLJF"
},
"source": [
"X, y, coef = make_regression(n_samples=60_000, n_features=3_000, n_informative=1_000, coef=True, bias=100, noise=10)"
],
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-AhzUMQmrO6b",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c7734150-f191-4a7a-d147-1248cc0d2dc7"
},
"source": [
"%%time\n",
"linear = Ridge(1e-5).fit(X, y)"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": [
"CPU times: user 34.5 s, sys: 1.38 s, total: 35.9 s\n",
"Wall time: 18.8 s\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DdZFVcdilNqM"
},
"source": [
"The error of a Ridge model will be our baseline."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6VKxWvYtsOcM",
"outputId": "821b0099-7e6e-4733-f167-db772d348785"
},
"source": [
"print(mean_squared_error(linear.coef_, coef))"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": [
"0.001725866714478058\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6F4aHtjvvLon"
},
"source": [
"def quantile_gradient(X, y, reg=0, q=0.5, base_lr=0.1, momentum=0.9, momentum_decay=0.2, lr_decay=0.8, max_steps=1000, rel_tol=1e-2, abs_tol=1e-4, patience=10, verbose=True, intercept=True):\n",
" lr = base_lr * np.std(y)\n",
" prev_loss = np.infty\n",
" min_loss = np.infty\n",
" patience_steps = 0\n",
" n, m = X.shape\n",
" beta = np.zeros(m + 1)\n",
" cum_grad = beta * 0\n",
"\n",
" for i in range(max_steps):\n",
" resid = y - np.dot(X, beta[:m]) - beta[-1] * intercept\n",
" coef = beta[:m]\n",
" loss = np.mean((resid > 0) * resid * q - (resid < 0) * resid * (1 - q)) \\\n",
" + reg * sum((coef > 0) * coef - (coef < 0) * coef)\n",
" if verbose:\n",
" print(loss)\n",
" if loss > prev_loss:\n",
" cum_grad *= momentum_decay\n",
" lr *= lr_decay\n",
" if loss < min_loss * (1-rel_tol) or loss < min_loss - abs_tol:\n",
" min_loss = loss\n",
" patience_steps = 0\n",
" else:\n",
" patience_steps += 1\n",
" if patience_steps > patience:\n",
" if verbose:\n",
" print(f'early stopping after {i} steps')\n",
" break\n",
" prev_loss = loss\n",
" dldpred = (resid > 0) * q - (resid < 0) * (1 - q)\n",
" grad = np.concatenate([\n",
" np.dot(dldpred, X) / len(resid) - reg * ((coef > 0) * 1 - (coef < 0) * 1), \n",
" [np.mean(dldpred)]\n",
" ])\n",
" cum_grad = cum_grad * momentum + grad\n",
" delta = cum_grad * lr\n",
" if reg:\n",
" # small coefficients with small gradient stay small\n",
" delta[:m][(coef == 0) & (np.abs(delta[:m]) < reg)] = 0\n",
" beta += delta\n",
" if reg:\n",
" # coefficient that would change sign just stay zero\n",
" beta[:m][beta[:m] * coef < 0] = 0\n",
" if intercept:\n",
" return beta[-1], beta[:m]\n",
" return beta"
],
"execution_count": 53,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YpwtQjSOlJxf",
"outputId": "e8691871-ec2a-44fb-cd29-be6d80533a59"
},
"source": [
"%%time\n",
"intercept_, coef_ = quantile_gradient(X, y)"
],
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"text": [
"721.4498481230092\n",
"690.6059539491163\n",
"632.4861174767065\n",
"550.7371819466156\n",
"449.44699384775305\n",
"334.29649570849017\n",
"215.22422650630403\n",
"122.9014098172564\n",
"134.17752824130187\n",
"119.56367284801284\n",
"82.34356908134315\n",
"37.59439427733621\n",
"54.63865105675187\n",
"40.21363067649704\n",
"13.794595280859172\n",
"33.18709739526756\n",
"20.345223714953192\n",
"12.676932830011507\n",
"21.41917506290413\n",
"9.47364769538664\n",
"16.248666975848817\n",
"8.451094810804216\n",
"11.353466165237702\n",
"5.67028143822664\n",
"9.57025678374903\n",
"5.121668822670301\n",
"7.714832910106319\n",
"4.70959409770891\n",
"6.076627200663249\n",
"4.252777790945006\n",
"5.002369227235568\n",
"4.059700928232004\n",
"4.344134743811103\n",
"3.941821030331402\n",
"4.022687126978004\n",
"3.883779781504334\n",
"3.8924598791930918\n",
"3.848867921537792\n",
"3.849474560278547\n",
"3.834222192374651\n",
"3.8355329497341937\n",
"3.8287404832852867\n",
"3.830226735533932\n",
"3.826345302630014\n",
"3.8276351189731823\n",
"3.825129597902225\n",
"3.8258961994766176\n",
"3.8243582354258923\n",
"3.824859485301996\n",
"3.823825612040234\n",
"3.824164316216894\n",
"3.823469866015381\n",
"3.823701571136547\n",
"3.823206655850575\n",
"3.823346856553724\n",
"3.8229957939367214\n",
"3.823127331313798\n",
"3.822859248509898\n",
"3.8229323786909957\n",
"3.822747415399307\n",
"3.8228014847412166\n",
"3.822658747971786\n",
"3.8227022837163145\n",
"3.82259706621374\n",
"3.822636041674885\n",
"3.822545267415549\n",
"3.8225801576623724\n",
"3.822508745554385\n",
"3.8225299467479137\n",
"3.8224761637607547\n",
"3.822491918027882\n",
"3.8224513716181803\n",
"3.8224614706566022\n",
"3.8224309308387614\n",
"3.8224390922789646\n",
"3.8224141383916033\n",
"3.822421757861059\n",
"3.822403114488966\n",
"3.822407732599003\n",
"3.8223930350146724\n",
"3.822396500286839\n",
"early stopping after 80 steps\n",
"CPU times: user 36.1 s, sys: 1.1 s, total: 37.2 s\n",
"Wall time: 19.6 s\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zwyjr6Cvhkuq"
},
"source": [
"We see that quantile regression has higher error that linear regression, but it is not higher by orders of magnitude.\n",
"\n",
"Abd it is comparable with linear regression: 20 vs 18 seconds. "
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UxMAHel7sf7A",
"outputId": "0798db38-bb3f-40a3-d778-31061841d2fc"
},
"source": [
"print(mean_squared_error(coef_, coef))"
],
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": [
"0.002725793428993132\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HNffJV9fhv04"
},
"source": [
"Here we test that the algorithm is adequate for various quantiles."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ybd_NTeeiGer"
},
"source": [
"X, y, coef = make_regression(n_samples=6000, n_features=300, n_informative=100, coef=True, noise=10, bias=100)"
],
"execution_count": 54,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NhcVfXPTspaS",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1238abb8-79ae-46e0-80e8-d0aff37914e8"
},
"source": [
"intercept_, coef_ = quantile_gradient(X, y, q=0.1, verbose=False)\n",
"print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
"intercept_, coef_ = quantile_gradient(X, y, q=0.45, verbose=False)\n",
"print(np.mean(y < np.dot(X, coef_) + intercept_ ))\n",
"intercept_, coef_ = quantile_gradient(X, y, q=0.95, verbose=False)\n",
"print(np.mean(y < np.dot(X, coef_) + intercept_ ))"
],
"execution_count": 55,
"outputs": [
{
"output_type": "stream",
"text": [
"0.09933333333333333\n",
"0.4505\n",
"0.9501666666666667\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SooYgTc904lU"
},
"source": [
"Check how regularization affects stability.\n",
"\n",
"We can see that with a lot of regularization the algorithm seems unstable. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "sHFthSGEitpO",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "857473e2-b46a-4a59-b73a-3fba3d52f885"
},
"source": [
"for reg in [0, 1e-4, 1e-3, 1e-2, 0.1, 0.5, 1, 10, 100]:\n",
" intercept_, coef_ = quantile_gradient(X, y, reg=reg, verbose=False)\n",
" print(reg, '\\t', sum(np.abs(coef_)), sum(np.abs(coef_) < 1e-3))"
],
"execution_count": 56,
"outputs": [
{
"output_type": "stream",
"text": [
"0 \t 5345.732735063019 0\n",
"0.0001 \t 5344.979756393498 1\n",
"0.001 \t 5337.481236505218 13\n",
"0.01 \t 5293.496535220853 159\n",
"0.1 \t 675.1580890111018 5\n",
"0.5 \t 1957.0192335098054 120\n",
"1 \t 671.5446372105087 195\n",
"10 \t 1049.2034863387717 289\n",
"100 \t 0.0 300\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "2nVL5l1FwEFc"
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
@Sandy4321
Copy link

sklearn quantreg is bad on categorical data ( after one hot) i wrote them but thye are not intersted ....

avidale/quantreg_gradient.ipynb

Here is a naive implementation of gradient descent for Quantile regression problem (scikit-learn/scikit-learn#9978).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment