This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Returns a subset of the original data with the selected features | |
subset = Feature_Selector.Subset() | |
subset.head() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Returns Boxplot of features | |
Feature_Selector.plot(which_features='all', | |
X_size=8, figsize=(12,8), | |
y_scale='log') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# no model selected default is Random Forest, if classification is True it is a Classification problem | |
Feature_Selector = BorutaShap(importance_measure='shap', | |
classification=False) | |
Feature_Selector.fit(X=X, y=y, n_trials=100, random_state=0) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from BorutaShap import BorutaShap, load_data | |
X, y = load_data(data_type='regression') | |
X.head() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
class Node: | |
''' | |
This class defines a node which creates a tree structure by recursively calling itself | |
whilst checking a number of ending parameters such as depth and min_leaf. It uses an exact greedy method | |
to exhaustively scan every possible split point. Algorithm is based on Frieman's 2001 Gradient Boosting Machines | |
Input |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
from math import e | |
class Node: | |
''' | |
A node object that is recursivly called within itslef to construct a regression tree. Based on Tianqi Chen's XGBoost | |
the internal gain used to find the optimal split value uses both the gradient and hessian. Also a weighted quantlie sketch | |
and optimal leaf values all follow Chen's description in "XGBoost: A Scalable Tree Boosting System" the only thing not |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
from math import e | |
class Node: | |
''' | |
This class defines a node which creates a tree structure by recursively calling itself | |
whilst checking a number of ending parameters such as depth and min_leaf. It uses an exact greedy method | |
to exhaustively scan every possible split point. The gain metric of choice is conservation of varience. | |
This is a Naive solution and does not comapre to Frieman's 2001 Gradient Boosting Machines |