Skip to content

Instantly share code, notes, and snippets.

@emuccino
Last active April 13, 2020 08:01
Show Gist options
  • Save emuccino/9af4408730b641e566b64bd24cd341cf to your computer and use it in GitHub Desktop.
Save emuccino/9af4408730b641e566b64bd24cd341cf to your computer and use it in GitHub Desktop.
Load and clean Loan data
import numpy as np
import pandas as pd
#import loan dataset
df = pd.read_csv('loan.csv').dropna(axis=1,how='any')
#convert loan grades to numerical values
df['sub_grade'] = df['sub_grade'].str.slice(start=1).astype(int)
grade_dict = {k:i for i,k in enumerate(['A', 'B', 'C', 'D', 'E', 'F', 'G'])}
term_dict = {k:i for i,k in enumerate(['36 months', '60 months'])}
df['grade'] = np.array([grade_dict[i] for i in df['grade'].values])
df['grade'] = (df['grade']*df['sub_grade'].max())+df['sub_grade']
#define loan condition, this will be our target for classificaiton
bad_loan = ["Charged Off", "Default", "Does not meet the credit policy. Status:Charged Off", "In Grace Period",
"Late (16-30 days)", "Late (31-120 days)"]
def loan_condition(status):
if status in bad_loan:
return 1
else:
return 0
df['loan_condition'] = df['loan_status'].apply(loan_condition)
#select features to use for classification
features = ['loan_condition','loan_amnt','term','int_rate',
'installment','grade','home_ownership','verification_status',
'purpose','addr_state','dti','revol_bal']
#sample portion of data for demonstration purposes
df = df[features].convert_dtypes().sample(frac=.1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment