Last active
April 2, 2021 17:24
-
-
Save PeterKjeldsen/99d814662de8bd25c2e579d672b3a0dd to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "<center>\n <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n</center>\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# **Data Wrangling Lab**\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Estimated time needed: **45 to 60** minutes\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "In this assignment you will be performing data wrangling.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Objectives\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "In this lab you will perform the following:\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "- Identify duplicate values in the dataset.\n\n- Remove duplicate values from the dataset.\n\n- Identify missing values in the dataset.\n\n- Impute the missing values in the dataset.\n\n- Normalize data in the dataset.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "<hr>\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Hands on Lab\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Import pandas module.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "import pandas as pd", | |
"execution_count": 94, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Load the dataset into a dataframe.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df = pd.read_csv(\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m1_survey_data.csv\")", | |
"execution_count": 95, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df.shape[0] #How many rows", | |
"execution_count": 96, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 96, | |
"data": { | |
"text/plain": "11552" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df.shape[1] #How many columns", | |
"execution_count": 97, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 97, | |
"data": { | |
"text/plain": "85" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df.size", | |
"execution_count": 98, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 98, | |
"data": { | |
"text/plain": "981920" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Finding duplicates\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "In this section you will identify duplicate values in the dataset.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": " Find how many duplicate rows exist in the dataframe.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\nlen(df)-len(df.drop_duplicates())", | |
"execution_count": 99, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 99, | |
"data": { | |
"text/plain": "154" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# In order to answer second question in test 1\nlen(df)-len(df.drop_duplicates('Respondent'))", | |
"execution_count": 100, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 100, | |
"data": { | |
"text/plain": "154" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Removing duplicates\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Remove the duplicate rows from the dataframe.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf.drop_duplicates(keep ='first', inplace=True)", | |
"execution_count": 101, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Verify if duplicates were actually dropped.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\nlen(df)-len(df.drop_duplicates())\ndf.size", | |
"execution_count": 102, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 102, | |
"data": { | |
"text/plain": "968830" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df.shape[0] #How many rows are in the DataFrame now?", | |
"execution_count": 103, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 103, | |
"data": { | |
"text/plain": "11398" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "len(df.drop_duplicates('Respondent'))", | |
"execution_count": 104, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 104, | |
"data": { | |
"text/plain": "11398" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "len(df.drop_duplicates('CompFreq'))", | |
"execution_count": 105, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 105, | |
"data": { | |
"text/plain": "4" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Finding Missing values\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Find the missing values for all columns.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\nmissing_values = df.isnull()\nmissing_values.head()", | |
"execution_count": 106, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 106, | |
"data": { | |
"text/plain": " Respondent MainBranch Hobbyist OpenSourcer OpenSource Employment \\\n0 False False False False False False \n1 False False False False False False \n2 False False False False False False \n3 False False False False False False \n4 False False False False False False \n\n Country Student EdLevel UndergradMajor ... WelcomeChange \\\n0 False False False False ... False \n1 False False False False ... False \n2 False False False False ... False \n3 False False False True ... False \n4 False False False False ... False \n\n SONewContent Age Gender Trans Sexuality Ethnicity Dependents \\\n0 False False False False False False False \n1 True False False False False False False \n2 False False False False False False False \n3 False False False False False False False \n4 False False False False False False False \n\n SurveyLength SurveyEase \n0 False False \n1 False False \n2 False False \n3 False False \n4 False False \n\n[5 rows x 85 columns]", | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Respondent</th>\n <th>MainBranch</th>\n <th>Hobbyist</th>\n <th>OpenSourcer</th>\n <th>OpenSource</th>\n <th>Employment</th>\n <th>Country</th>\n <th>Student</th>\n <th>EdLevel</th>\n <th>UndergradMajor</th>\n <th>...</th>\n <th>WelcomeChange</th>\n <th>SONewContent</th>\n <th>Age</th>\n <th>Gender</th>\n <th>Trans</th>\n <th>Sexuality</th>\n <th>Ethnicity</th>\n <th>Dependents</th>\n <th>SurveyLength</th>\n <th>SurveyEase</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>...</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>...</td>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>...</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n <td>...</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>...</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows \u00d7 85 columns</p>\n</div>" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Find out how many rows are missing in the column 'WorkLoc'\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf['WorkLoc'].isnull().sum()", | |
"execution_count": 107, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 107, | |
"data": { | |
"text/plain": "32" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['EdLevel'].isnull().sum()", | |
"execution_count": 108, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 108, | |
"data": { | |
"text/plain": "112" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['Country'].isnull().sum()", | |
"execution_count": 109, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 109, | |
"data": { | |
"text/plain": "0" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Imputing missing values\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Find the value counts for the column WorkLoc.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf['WorkLoc'].value_counts()", | |
"execution_count": 110, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 110, | |
"data": { | |
"text/plain": "Office 6806\nHome 3589\nOther place, such as a coworking space or cafe 971\nName: WorkLoc, dtype: int64" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['Employment'].value_counts()", | |
"execution_count": 111, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 111, | |
"data": { | |
"text/plain": "Employed full-time 10968\nEmployed part-time 430\nName: Employment, dtype: int64" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['UndergradMajor'].value_counts()", | |
"execution_count": 112, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 112, | |
"data": { | |
"text/plain": "Computer science, computer engineering, or software engineering 6953\nInformation systems, information technology, or system administration 794\nAnother engineering discipline (ex. civil, electrical, mechanical) 759\nWeb development or web design 410\nA natural science (ex. biology, chemistry, physics) 403\nMathematics or statistics 372\nA business discipline (ex. accounting, finance, marketing) 244\nA social science (ex. anthropology, psychology, political science) 210\nA humanities discipline (ex. literature, history, philosophy) 207\nFine arts or performing arts (ex. graphic design, music, studio art) 161\nI never declared a major 124\nA health science (ex. nursing, pharmacy, radiology) 24\nName: UndergradMajor, dtype: int64" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Identify the value that is most frequent (majority) in the WorkLoc column.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "#make a note of the majority value here, for future reference\n#6806\ndf['WorkLoc'].value_counts().max()", | |
"execution_count": 113, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 113, | |
"data": { | |
"text/plain": "6806" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['Employment'].value_counts().max()", | |
"execution_count": 114, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 114, | |
"data": { | |
"text/plain": "10968" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['UndergradMajor'].value_counts().min()", | |
"execution_count": 115, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 115, | |
"data": { | |
"text/plain": "24" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Impute (replace) all the empty rows in the column WorkLoc with the value that you have identified as majority.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf['WorkLoc'].fillna('Office', inplace=True)", | |
"execution_count": 116, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "After imputation there should ideally not be any empty rows in the WorkLoc column.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Verify if imputing was successful.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf['WorkLoc'].isnull().sum()", | |
"execution_count": 117, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 117, | |
"data": { | |
"text/plain": "0" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Normalizing data\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "There are two columns in the dataset that talk about compensation.\n\nOne is \"CompFreq\". This column shows how often a developer is paid (Yearly, Monthly, Weekly).\n\nThe other is \"CompTotal\". This column talks about how much the developer is paid per Year, Month, or Week depending upon his/her \"CompFreq\". \n\nThis makes it difficult to compare the total compensation of the developers.\n\nIn this section you will create a new column called 'NormalizedAnnualCompensation' which contains the 'Annual Compensation' irrespective of the 'CompFreq'.\n\nOnce this column is ready, it makes comparison of salaries easy.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "<hr>\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "List out the various categories in the column 'CompFreq'\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\nCategories_In_CompFreq=list((df['CompFreq'].value_counts()).index)\nCategories_In_CompFreq", | |
"execution_count": 118, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 118, | |
"data": { | |
"text/plain": "['Yearly', 'Monthly', 'Weekly']" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Create a new column named 'NormalizedAnnualCompensation'. Use the hint given below if needed.\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Double click to see the **Hint**.\n\n<!--\n\nUse the below logic to arrive at the values for the column NormalizedAnnualCompensation.\n\nIf the CompFreq is Yearly then use the exising value in CompTotal\nIf the CompFreq is Monthly then multiply the value in CompTotal with 12 (months in an year)\nIf the CompFreq is Weekly then multiply the value in CompTotal with 52 (weeks in an year)\n\n-->\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "# your code goes here\ndf.loc[df['CompFreq'] =='Yearly', 'NormalizedAnnualCompensation']=df['CompTotal']\ndf.loc[df['CompFreq'] =='Monthly', 'NormalizedAnnualCompensation']=df['CompTotal']*12\ndf.loc[df['CompFreq'] =='Weekly', 'NormalizedAnnualCompensation']=df['CompTotal']*52\ndf[['CompFreq', 'CompTotal', 'NormalizedAnnualCompensation']].head()", | |
"execution_count": 125, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 125, | |
"data": { | |
"text/plain": " CompFreq CompTotal NormalizedAnnualCompensation\n0 Yearly 61000.0 61000.0\n1 Yearly 138000.0 138000.0\n2 Yearly 90000.0 90000.0\n3 Monthly 29000.0 348000.0\n4 Yearly 90000.0 90000.0", | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>CompFreq</th>\n <th>CompTotal</th>\n <th>NormalizedAnnualCompensation</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Yearly</td>\n <td>61000.0</td>\n <td>61000.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Yearly</td>\n <td>138000.0</td>\n <td>138000.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Yearly</td>\n <td>90000.0</td>\n <td>90000.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Monthly</td>\n <td>29000.0</td>\n <td>348000.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Yearly</td>\n <td>90000.0</td>\n <td>90000.0</td>\n </tr>\n </tbody>\n</table>\n</div>" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "len(df.drop_duplicates('CompFreq'))", | |
"execution_count": 120, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 120, | |
"data": { | |
"text/plain": "4" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['CompFreq'].unique()", | |
"execution_count": 131, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 131, | |
"data": { | |
"text/plain": "array(['Yearly', 'Monthly', 'Weekly', nan], dtype=object)" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['CompFreq'].value_counts()", | |
"execution_count": 132, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 132, | |
"data": { | |
"text/plain": "Yearly 6073\nMonthly 4788\nWeekly 331\nName: CompFreq, dtype: int64" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['CompFreq'].describe()", | |
"execution_count": 133, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 133, | |
"data": { | |
"text/plain": "count 11192\nunique 3\ntop Yearly\nfreq 6073\nName: CompFreq, dtype: object" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df['NormalizedAnnualCompensation'].median()", | |
"execution_count": 134, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 134, | |
"data": { | |
"text/plain": "100000.0" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df[\"ConvertedComp\"].describe()", | |
"execution_count": 135, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 135, | |
"data": { | |
"text/plain": "count 1.058200e+04\nmean 1.315967e+05\nstd 2.947865e+05\nmin 0.000000e+00\n25% 2.686800e+04\n50% 5.774500e+04\n75% 1.000000e+05\nmax 2.000000e+06\nName: ConvertedComp, dtype: float64" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "code", | |
"source": "df[\"ConvertedComp\"].hist(figsize=(15,4))", | |
"execution_count": 136, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 136, | |
"data": { | |
"text/plain": "<matplotlib.axes._subplots.AxesSubplot at 0x7f0c9a96b990>" | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "<Figure size 1080x288 with 1 Axes>", | |
"image/png": "\n" | |
}, | |
"metadata": { | |
"needs_background": "light" | |
} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Authors\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Ramesh Sannareddy\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Other Contributors\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Rav Ahuja\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Change Log\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n| ----------------- | ------- | ----------------- | ---------------------------------- |\n| 2020-10-17 | 0.1 | Ramesh Sannareddy | Created initial version of the lab |\n" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": " Copyright \u00a9 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ).\n" | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3.7", | |
"language": "python" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.7.10", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment