-
-
Save Nikolay-Lysenko/06769d701c1d9c9acb9a66f2f9d7a6c7 to your computer and use it in GitHub Desktop.
import numpy as np | |
def xgb_quantile_eval(preds, dmatrix, quantile=0.2): | |
""" | |
Customized evaluational metric that equals | |
to quantile regression loss (also known as | |
pinball loss). | |
Quantile regression is regression that | |
estimates a specified quantile of target's | |
distribution conditional on given features. | |
@type preds: numpy.ndarray | |
@type dmatrix: xgboost.DMatrix | |
@type quantile: float | |
@rtype: float | |
""" | |
labels = dmatrix.get_label() | |
return ('q{}_loss'.format(quantile), | |
np.nanmean((preds >= labels) * (1 - quantile) * (preds - labels) + | |
(preds < labels) * quantile * (labels - preds))) | |
def xgb_quantile_obj(preds, dmatrix, quantile=0.2): | |
""" | |
Computes first-order derivative of quantile | |
regression loss and a non-degenerate | |
substitute for second-order derivative. | |
Substitute is returned instead of zeros, | |
because XGBoost requires non-zero | |
second-order derivatives. See this page: | |
https://github.com/dmlc/xgboost/issues/1825 | |
to see why it is possible to use this trick. | |
However, be sure that hyperparameter named | |
`max_delta_step` is small enough to satisfy: | |
```0.5 * max_delta_step <= | |
min(quantile, 1 - quantile)```. | |
@type preds: numpy.ndarray | |
@type dmatrix: xgboost.DMatrix | |
@type quantile: float | |
@rtype: tuple(numpy.ndarray) | |
""" | |
try: | |
assert 0 <= quantile <= 1 | |
except AssertionError: | |
raise ValueError("Quantile value must be float between 0 and 1.") | |
labels = dmatrix.get_label() | |
errors = preds - labels | |
left_mask = errors < 0 | |
right_mask = errors > 0 | |
grad = -quantile * left_mask + (1 - quantile) * right_mask | |
hess = np.ones_like(preds) | |
return grad, hess | |
# Example of usage: | |
# bst = xgb.train(hyperparams, train, num_rounds, | |
# obj=xgb_quantile_obj, feval=xgb_quantile_eval) |
This loss function makes all my predictions 0 for quantile 0.5... anyone having the same issue?
I also have this issue. Did you manage to solve it?
Here: http://jmarkhou.com/lgbqr/#mjx-eqn-quantileloss is a post by lightgbm that shows some issues they found with this approach and a way in which they improved it by replacing the 2nd order approximation of the Loss function with it's actual value.
How can we find the lower and upper for the prediction interval using the above function?
@Shafi2016, this can be done like this:
from functools import partial
lower_quantile = 0.2 # Any other value can be placed here.
upper_quantile = 0.8
xgb_quantile_lower_eval = partial(xgb_quantile_eval, quantile=lower_quantile)
xgb_quantile_lower_obj = partial(xgb_quantile_obj, quantile=lower_quantile)
lower_model = xgb.train(hyperparams, dtrain, num_rounds, obj=xgb_quantile_lower_obj, feval=xgb_quantile_lower_eval)
xgb_quantile_upper_eval = partial(xgb_quantile_eval, quantile=upper_quantile)
xgb_quantile_upper_obj = partial(xgb_quantile_obj, quantile=upper_quantile)
upper_model = xgb.train(hyperparams, dtrain, num_rounds, obj=xgb_quantile_upper_obj, feval=xgb_quantile_upper_eval)
lower_bound = lower_model.predict(dtest)
upper_bound = upper_model.predict(dtest)
However, this gist is quite old. Now, there are better solutions. I recommend you to look at CatBoost or LightGBM, because these tools have native support of quantile regression as well as performance comparable to that of XGBoost.
Thanks for the prompt response!. I have checked with both LightGBM and CatBoost. There is no doubt that their interval level is very stable. However, I could not get an improved forecast. In fact, I have a much better forecast XGBoost of H2o. Yet, H2o does not provide support for the Quantile regression. I tried to use prediction intervals using functions from this link (https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b). However, the interval range gets very narrow and when the interval is increased upper limits get flat and there is no impact on the lower interval. I am thinking if I can get a better interval from using your function and then wrapped it up with the prediction of XGboost H2o. I hope this can be done.
There are some questions about license. This gist is released under MIT License, so you can use it in your projects.
This loss function makes all my predictions 0 for quantile 0.5... anyone having the same issue?