Skip to content

Instantly share code, notes, and snippets.

@anthonyng2
Last active December 21, 2020 06:12
Show Gist options
  • Save anthonyng2/3912d920607d96a0ef52f6a9097ede14 to your computer and use it in GitHub Desktop.
Save anthonyng2/3912d920607d96a0ef52f6a9097ede14 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Macro Trading Strategy with XGBoost\n",
"\n",
"XGBoost is short for “Extreme Gradient Boosting”, where the term “Gradient Boosting” is proposed in the paper Greedy Function Approximation: A Gradient Boosting Machine, by Friedman. XGBoost is based on this original model. \n",
"\n",
"XGBoost is used for supervised learning problems\n",
"\n",
"http://xgboost.readthedocs.io/en/latest/model.html\n",
"\n",
"I am unable to provide the original data without violating the terms of contracts with vendors. However, you can access these from a Bloomberg Professional Terminal. I have used the original ticker for your reference."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Factors\n",
"\n",
"The idea for this model is from JP Morgan's May 2017 publication titled **Big Data and AI Strategies**.\n",
"\n",
"Using the following macro indicators as factors:\n",
"\n",
"* High Yield Credit Spreads, CDX_HY\n",
"* Investment Grade Credit Spreads, CDX_IG\n",
"* Economic Surprise Index, CESIUSD\n",
"* Oil, Crude_Oil\n",
"* US Dollar Index, DXY\n",
"* Gold, GLD\n",
"* US 10Yr Treasury, GT10\n",
"* 10Y-2Y Spread, USYC2Y10\n",
"\n",
"In this simple test case, we are attempting to predict the returns of Consumer Discretionary Select Sector SPDR ETF (XLY).\n",
"\n",
"One can turn this into a long-short strategy trading the nine SPDR sector ETFs. The basic idea is that using XGBoost, one predict the returns of each of the sector ETFs, rank them, long the top 3 and short the bottom 3. \n",
"\n",
"Unfortunately, I am not able to implement this on the Quantopian platform as they currently do not support XGBoost. For now, this would need to be tested via vectorized method.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from xgboost import XGBRegressor\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dataset = pd.read_csv('master.csv')\n",
"X = dataset.iloc[:, 10:19][1:].values\n",
"y = dataset.iloc[:, 1].pct_change()[1:].values"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>XLY</th>\n",
" <th>XLF</th>\n",
" <th>XLK</th>\n",
" <th>XLE</th>\n",
" <th>XLV</th>\n",
" <th>XLI</th>\n",
" <th>XLP</th>\n",
" <th>XLB</th>\n",
" <th>XLU</th>\n",
" <th>CDX_HY</th>\n",
" <th>CDX_IG</th>\n",
" <th>CESIUSD</th>\n",
" <th>Crude_Oil</th>\n",
" <th>DXY</th>\n",
" <th>GLD</th>\n",
" <th>GT10</th>\n",
" <th>USYC2Y10</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>09/09/2011</td>\n",
" <td>32.358040</td>\n",
" <td>6.778456</td>\n",
" <td>21.111734</td>\n",
" <td>56.789097</td>\n",
" <td>28.906340</td>\n",
" <td>26.604198</td>\n",
" <td>25.542053</td>\n",
" <td>29.202261</td>\n",
" <td>26.290201</td>\n",
" <td>92.000</td>\n",
" <td>132.25</td>\n",
" <td>-41.8</td>\n",
" <td>156.19</td>\n",
" <td>77.192</td>\n",
" <td>180.70</td>\n",
" <td>1.920</td>\n",
" <td>174.879</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>09/12/2011</td>\n",
" <td>32.741726</td>\n",
" <td>6.856050</td>\n",
" <td>21.363386</td>\n",
" <td>57.060394</td>\n",
" <td>29.005796</td>\n",
" <td>26.657087</td>\n",
" <td>25.567596</td>\n",
" <td>29.008575</td>\n",
" <td>26.506611</td>\n",
" <td>91.313</td>\n",
" <td>135.75</td>\n",
" <td>-39.5</td>\n",
" <td>157.89</td>\n",
" <td>77.578</td>\n",
" <td>176.67</td>\n",
" <td>1.948</td>\n",
" <td>174.161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>09/13/2011</td>\n",
" <td>33.125416</td>\n",
" <td>6.900391</td>\n",
" <td>21.615030</td>\n",
" <td>57.217922</td>\n",
" <td>29.304174</td>\n",
" <td>27.150738</td>\n",
" <td>25.661251</td>\n",
" <td>29.483980</td>\n",
" <td>26.682945</td>\n",
" <td>92.375</td>\n",
" <td>132.25</td>\n",
" <td>-38.5</td>\n",
" <td>161.51</td>\n",
" <td>76.919</td>\n",
" <td>178.54</td>\n",
" <td>1.992</td>\n",
" <td>178.866</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>09/14/2011</td>\n",
" <td>33.673565</td>\n",
" <td>6.983530</td>\n",
" <td>21.938587</td>\n",
" <td>57.926796</td>\n",
" <td>29.584467</td>\n",
" <td>27.626760</td>\n",
" <td>25.967760</td>\n",
" <td>29.941778</td>\n",
" <td>26.867296</td>\n",
" <td>93.000</td>\n",
" <td>129.75</td>\n",
" <td>-39.4</td>\n",
" <td>159.18</td>\n",
" <td>76.833</td>\n",
" <td>177.21</td>\n",
" <td>1.985</td>\n",
" <td>179.759</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>09/15/2011</td>\n",
" <td>34.239948</td>\n",
" <td>7.160888</td>\n",
" <td>22.289101</td>\n",
" <td>59.073273</td>\n",
" <td>29.855724</td>\n",
" <td>28.190929</td>\n",
" <td>26.274261</td>\n",
" <td>30.452398</td>\n",
" <td>27.203936</td>\n",
" <td>94.000</td>\n",
" <td>125.75</td>\n",
" <td>-42.2</td>\n",
" <td>160.06</td>\n",
" <td>76.241</td>\n",
" <td>174.40</td>\n",
" <td>2.083</td>\n",
" <td>189.179</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date XLY XLF XLK XLE XLV \\\n",
"0 09/09/2011 32.358040 6.778456 21.111734 56.789097 28.906340 \n",
"1 09/12/2011 32.741726 6.856050 21.363386 57.060394 29.005796 \n",
"2 09/13/2011 33.125416 6.900391 21.615030 57.217922 29.304174 \n",
"3 09/14/2011 33.673565 6.983530 21.938587 57.926796 29.584467 \n",
"4 09/15/2011 34.239948 7.160888 22.289101 59.073273 29.855724 \n",
"\n",
" XLI XLP XLB XLU CDX_HY CDX_IG CESIUSD \\\n",
"0 26.604198 25.542053 29.202261 26.290201 92.000 132.25 -41.8 \n",
"1 26.657087 25.567596 29.008575 26.506611 91.313 135.75 -39.5 \n",
"2 27.150738 25.661251 29.483980 26.682945 92.375 132.25 -38.5 \n",
"3 27.626760 25.967760 29.941778 26.867296 93.000 129.75 -39.4 \n",
"4 28.190929 26.274261 30.452398 27.203936 94.000 125.75 -42.2 \n",
"\n",
" Crude_Oil DXY GLD GT10 USYC2Y10 \n",
"0 156.19 77.192 180.70 1.920 174.879 \n",
"1 157.89 77.578 176.67 1.948 174.161 \n",
"2 161.51 76.919 178.54 1.992 178.866 \n",
"3 159.18 76.833 177.21 1.985 179.759 \n",
"4 160.06 76.241 174.40 2.083 189.179 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1433, 18)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset.shape"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 91.313, 135.75 , -39.5 , 157.89 , 77.578, 176.67 ,\n",
" 1.948, 174.161],\n",
" [ 92.375, 132.25 , -38.5 , 161.51 , 76.919, 178.54 ,\n",
" 1.992, 178.866],\n",
" [ 93. , 129.75 , -39.4 , 159.18 , 76.833, 177.21 ,\n",
" 1.985, 179.759]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X[:3]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.01185752, 0.01171869, 0.01654769])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y[:3]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"seg = 1403\n",
"X_train = X[:seg,:]\n",
"y_train = y[:seg]\n",
"X_test = X[seg:,:]\n",
"y_test = y[seg:]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sc_X = StandardScaler()\n",
"X_train = sc_X.fit_transform(X_train)\n",
"X_test = sc_X.transform(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bytree=1, eval_metric='logloss', gamma=0,\n",
" learning_rate=0.1, max_delta_step=0, max_depth=7,\n",
" min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,\n",
" nthread=None, objective='reg:linear', random_state=0, reg_alpha=0,\n",
" reg_lambda=1, scale_pos_weight=1, seed=None, silent=True,\n",
" subsample=1)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = XGBRegressor(booster=\"gbtree\", objective=\"reg:linear\", \n",
" max_depth = 7,\n",
" subsample=1, eval_metric='logloss')\n",
"model.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"XGBoost\n",
"RMSE: 0.0061\n",
"R^2 Score: -0.0374\n"
]
}
],
"source": [
"y_xg_pred = model.predict(X_test)\n",
"\n",
"print(\"XGBoost\")\n",
"print(\"RMSE: {0:.4f}\".format(np.sqrt(mean_squared_error(y_test, y_xg_pred))))\n",
"print(\"R^2 Score: {0:.4f}\".format(r2_score(y_test, y_xg_pred)))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr = LinearRegression()\n",
"lr.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear Regression\n",
"RMSE: 0.0060\n",
"R^2 Score: -0.0034\n"
]
}
],
"source": [
"y_lr_pred = lr.predict(X_test)\n",
"\n",
"print(\"Linear Regression\")\n",
"print(\"RMSE: {0:.4f}\".format(np.sqrt(mean_squared_error(y_test, y_lr_pred))))\n",
"print(\"R^2 Score: {0:.4f}\".format(r2_score(y_test, y_lr_pred)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The RMSE of XGBoost is slightly worst than Linear Regression. The $R^2$ of XGBoost is 10x better. However, it is difficult to tell until one backtest them both.\n",
"\n",
"This is just a simple demo of how one can use XGBoost to predict the returns. You can now easily extend it to multi-assets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment