This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Data partitioning | |
| # Unique values of Loan_Status | |
| df_concat['Loan_Status'].value_counts() | |
| # Training set | |
| df_train = df_concat[df_concat['Loan_Status'].isin([0, 1])].reset_index(drop = True) | |
| print('Dimension data: {} rows and {} columns'.format(len(df_train), len(df_train.columns))) | |
| df_train.head() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # One hot encoder | |
| # Add new column of Loan_Status with 999 in testing data | |
| df_test['Loan_Status'] = 999 | |
| # Concat the training and testing data | |
| df_concat = pd.concat(objs = [df_train , df_test], axis = 0) | |
| # Drop the column of Loan_ID | |
| df_concat.drop(columns = ['Loan_ID'], inplace = True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # The distribution of loan amount by loan status | |
| # Slice the columns | |
| df_viz_5 = df_train[['LoanAmount', 'Loan_Status']].reset_index(drop = True) | |
| # Map the loan status | |
| df_viz_5['Loan_Status'] = df_viz_5['Loan_Status'].map( | |
| { | |
| 0: 'Not default', | |
| 1: 'Default' | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # The distribution of applicant incomes by loan status | |
| # Slice the columns | |
| df_viz_4 = df_train[['ApplicantIncome', 'Loan_Status']].reset_index(drop = True) | |
| # Map the loan status | |
| df_viz_4['Loan_Status'] = df_viz_4['Loan_Status'].map( | |
| { | |
| 0: 'Not default', | |
| 1: 'Default' | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Number of customers by loan status and educations | |
| # Data aggregation between loan status and dependents | |
| df_viz_3 = df_train.groupby(['Loan_Status', 'Education'])['Loan_ID'].count().reset_index(name = 'Total') | |
| # Map the loan status | |
| df_viz_3['Loan_Status'] = df_viz_3['Loan_Status'].map( | |
| { | |
| 0: 'Not default', | |
| 1: 'Default' | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Number of customers by loan status and the dependents | |
| # Data aggregation between loan status and dependents | |
| df_viz_2 = df_train.groupby(['Loan_Status', 'Dependents'])['Loan_ID'].count().reset_index(name = 'Total') | |
| # Map the loan status | |
| df_viz_2['Loan_Status'] = df_viz_2['Loan_Status'].map( | |
| { | |
| 0: 'Not default', | |
| 1: 'Default' | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Number of customers by loan status | |
| # Data aggregation between default and not default customers | |
| df_viz_1 = df_train.groupby(['Loan_Status'])['Loan_ID'].count().reset_index(name = 'Total') | |
| # Map the loan status | |
| df_viz_1['Loan_Status'] = df_viz_1['Loan_Status'].map( | |
| { | |
| 0: 'Not default', | |
| 1: 'Default' | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -------------------- TESTING SET -------------------- | |
| # Data frame metadata | |
| df_test.info() | |
| # Change column types | |
| df_test = df_test.astype({'Credit_History': object}) | |
| df_test.select_dtypes(include = ['object']).dtypes | |
| # Summary statistics of categorical columns | |
| for i in df_test.select_dtypes('object').columns: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -------------------- TRAINING SET -------------------- | |
| # Data frame metadata | |
| df_train.info() | |
| # Change column types | |
| df_train = df_train.astype({'Credit_History': object, 'Loan_Status': int}) | |
| df_train.select_dtypes(include = ['object']).dtypes | |
| # Summary statistics of categorical columns | |
| for i in df_train.select_dtypes('object').columns: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -------------------- TRAINING SET -------------------- | |
| # Import the training set | |
| df_train = pd.read_csv( | |
| filepath_or_buffer = 'https://raw.githubusercontent.com/dphi-official/Datasets/master/Loan_Data/loan_train.csv', | |
| usecols = [i for i in range(1, 14)] | |
| ) | |
| # Data dimension | |
| print('Data dimension: {} rows and {} columns'.format(len(df_train), len(df_train.columns))) | |
| df_train.head() |