This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#In the 80% train set, split the train set into d1 and d2.(50-50). | |
d1,d2,y1,y2 = train_test_split(X_train,y_train,stratify=y_train,test_size=0.5,random_state=15) | |
d1 = d1.reset_index(drop=True) | |
d2 = d2.reset_index(drop=True) | |
y1 = y1.reset_index(drop=True) | |
y2 = y2.reset_index(drop=True) | |
def generating_samples(d1, y1): | |
"""From this d1,sampling with replacement is done | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+---------------------+-----------------+----------------+----------------+---------------+ | |
| Model | Train AUC Score | Test AUC Score | Train F1 Score | Test F1 Score | | |
+---------------------+-----------------+----------------+----------------+---------------+ | |
| Stacking_Classifier | 0.99537 | 0.9902 | 0.99429 | 0.98759 | | |
+---------------------+-----------------+----------------+----------------+---------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
x_cfl=XGBClassifier(n_estimators=1000,nthread=-1) | |
x_1=XGBClassifier(n_estimators=500,nthread=-1) | |
x_2=XGBClassifier(n_estimators=500,nthread=-1) | |
x_3 = DecisionTreeClassifier(max_depth=best_depth,min_samples_split=best_samples,class_weight='balanced') | |
x_4 = LogisticRegression(class_weight='balanced') | |
s_clf = StackingClassifier(classifiers=[x_1,x_2,x_3,x_4],meta_classifier=x_cfl) | |
s_clf.fit(X_train,y_train) | |
#sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid") | |
#sig_clf.fit(X_train_f, y_train_f) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+---------+-----------------+----------------+----------------+---------------+ | |
| Model | Train AUC Score | Test AUC Score | Train F1 Score | Test F1 Score | | |
+---------+-----------------+----------------+----------------+---------------+ | |
| XgBoost | 0.99938 | 0.99855 | 0.9998 | 0.990791 | | |
+---------+-----------------+----------------+----------------+---------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
s1 = data['score'] | |
s_s1 = sum(s1.tolist()) | |
s_s1 | |
plt.style.use('fivethirtyeight') | |
ax=data.head(20).plot(kind = 'barh' , color = 'red') | |
for p in ax.patches: | |
percentage = '{:.1f}%'.format(100 * p.get_width()/s_s1) | |
x = p.get_x() + p.get_width() - 0.5 | |
y = p.get_y() + p.get_height() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
provider_count 14960 | |
State 5550 | |
attend_physician_count 4258 | |
County 3218 | |
OPAnnualReimbursementAmt 816 | |
total_diff_amount 800 | |
ClmDiagnosisCode_1_count 713 | |
OPAnnualDeductibleAmt 668 | |
InscClaimAmtReimbursed 634 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
feature_important = xg1.get_booster().get_score(importance_type='weight') | |
keys = list(feature_important.keys()) | |
values = list(feature_important.values()) | |
data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
training score: 0.9998006310038143 | |
testing score: 0.990791808142236 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A parameter grid for XGBoost | |
parameters = { | |
'n_estimators': [100,500,1000], | |
'learning_rate': [0.1, 0.01, 0.05] | |
} | |
xgb = XGBClassifier(objective='binary:logistic', | |
silent=True, nthread=4) | |
xg_grid = GridSearchCV(xgb, param_grid=parameters, n_jobs=-1, verbose=1,scoring='f1_macro',cv=3,return_train_score=True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+---------------+-----------------+----------------+----------------+---------------+ | |
| Model | Train AUC Score | Test AUC Score | Train F1 Score | Test F1 Score | | |
+---------------+-----------------+----------------+----------------+---------------+ | |
| Decision_Tree | 0.9967 | 0.9909 | 0.99314 | 0.9771 | | |
+---------------+-----------------+----------------+----------------+---------------+ |