ylogx/xgboost_incremental.ipynb

Last active July 18, 2025 20:17

Star (36) You must be signed in to star a gist
Fork (17) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ylogx/53fef94cc61d6a3e9b3eb900482f41e0.js"></script>
Save ylogx/53fef94cc61d6a3e9b3eb900482f41e0 to your computer and use it in GitHub Desktop.

Download ZIP

XGBoost Incremental Learning

Raw

xgboost_incremental.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

DataSphereX commented Aug 1, 2019

Hi,

I am also unable to replicate the analogy of this model, for example of a Telecom Churn prediction, as and when a new customer gets added who will I re-use the old model to train the model instead of retraining with complete data.

karelin commented Oct 2, 2019

Just tried the notebook (xgboost version 0.90). Unfortunately, call to xgb.train (for one shot learning) raises error "XGBoostError: [13:44:55] src/gbm/gbtree.cc:278: Check failed: model_.trees.size() < model_.trees_to_update.size() (0 vs. 0) :"

trivialfis commented Dec 20, 2019

I ran the gist on master branch and it works fine. Should be fixed with new model IO routines.

luisvivasg commented Apr 4, 2020

I also got the same error as Karelin. And I this the same as Venkatesh

Check failed: model_.trees.size() < model_.trees_to_update.size() (0 vs. 0) :

I saw somewhere that it is needed to add the number of trees created in the first iteration, however, I cannot get that number. And it is never added in the code above.

c3-varun commented Apr 19, 2021

Same issue on XGBoost 1.4.0. Has anyone figured this out yet?

pjbhaumik commented Feb 3, 2022

Hi,
I have found the solution. Per xgboost documentation, the parameter 'update' should be 'updater'... this is a mistake in the notebook above. If you fix this, then you will see the right results.

model = xgb.train({
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
'refresh_leaf': True,
#'reg_lambda': 3, # L2
'reg_alpha': 3, # L1
'silent': False,
}, dtrain=xgb.DMatrix(x_tr[start:start+batch_size], y_tr[start:start+batch_size]), xgb_model=model)

marymlucas commented Jul 14, 2023 •

edited

Loading

Disregard, I figured it out. I was using handle_unknown='ignore' in OneHotEncoder, but one of the features has too few of a particular category, hence the mismatch.

Thank you for this gist. How can we implement this in a pipeline?

I am unable to test on the Boston dataset as it's been removed from sklearn, but on a different dataset I get a mismatch in number of columns. Even though I use the same pipeline the saved model seems to have one less feature than the new training data and I am unable to figure out why.

Jason2Brownlee commented May 25, 2024

Great example!

Few people know that xgboost is able to perform incremental learning by adding boosting rounds.