Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save pb111/6e289ab05937edcdb74a3f7b73bc39f1 to your computer and use it in GitHub Desktop.
Save pb111/6e289ab05937edcdb74a3f7b73bc39f1 to your computer and use it in GitHub Desktop.
Multiple Linear Regression using Scikit-Learn
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multiple Linear Regression using Scikit-Learn\n",
"\n",
"\n",
"This project is about Multiple Linear Regression which is a machine learning algorithm. I build a multiple linear regression model to estimate the relative cpu performance of computer hardware dataset.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of contents\n",
"\n",
"The contents of this project are divided into various categories which are given as follows:-\n",
"\n",
"\n",
"1.\tIntroduction\n",
"\n",
"2.\tLinear regression intuition\n",
"\n",
"3.\tIndependent and dependent variables\n",
"\n",
"4.\tAssumptions of linear regression\n",
"\n",
"5.\tThe dataset description\n",
"\n",
"6.\tThe problem statement\n",
"\n",
"7.\tImport the Python libraries\n",
"\n",
"8.\tImport the dataset\n",
"\n",
"9.\tExploratory Data Analysis\n",
" - Explore types of variables\n",
" \n",
" - Estimate correlation coefficients\n",
" \n",
" - Correlation heat map\n",
" \n",
"10.\tDetect problems within variables\n",
" - Detect missing values\n",
" \n",
" - Outliers in discrete variables\n",
" \n",
" - Number of labels – cardinality\n",
" \n",
"11.\tLinear Regression modeling\n",
" - Divide the dataset into categorical and numerical variables\n",
" \n",
" - Select the predictor and target variables\n",
" \n",
" - Create separate train and test sets\n",
" \n",
" - Feature Scaling\n",
" \n",
" - Fit the Linear Regression model\n",
"\n",
"12.\tPredicting the results\n",
" - Predicting the test set results\n",
" \n",
" - Predicting estimated relative CPU performance values\n",
"\n",
"13.\tModel slope and intercept terms\n",
"\n",
"14.\tEvaluate model performance\n",
" - RMSE (Root Mean Square Error)\n",
" \n",
" - R2 Score\n",
" \n",
" - Overfitting or Underfitting\n",
" \n",
" - Cross validation\n",
" \n",
" - Residual analysis\n",
" \n",
" - Normality test (Q-Q Plot)\n",
" \n",
"15.\tConclusion\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Introduction\n",
"\n",
"\n",
"In this project, I build a multiple linear regression model to estimate the relative cpu performance of computer hardware dataset. Relative cpu performance of the computer hardware is described in terms of machine cycle time, main memory, cache memory and minimum and maximum channels as given in the dataset.\n",
"\n",
"\n",
"I discuss the basics and assumptions of linear regression. I also discuss the advantages and disadvantages and common pitfalls of linear regression. I present the implementation in Python programming language using Scikit-learn. Scikit-learn is the popular machine learning library of Python programming language. I also discuss various tools to evaluate the linear regression model performance.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Multiple linear regression intuition\n",
"\n",
"\n",
"Linear Regression is a machine learning algorithm which is used to establish the linear relationship between dependent and one or more independent variables. This technique is applicable for supervised learning regression problems where we try to predict a continuous variable. Linear Regression can be further classified into two types – Simple and Multiple Linear Regression. \n",
"\n",
"I have discussed the linear regression intuition in detail in the readme document.\n",
"\n",
"In this project, I employ Multiple Linear Regression technique where I have one dependent variable and more than one independent variables.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Independent and dependent variables\n",
"\n",
"\n",
"In this project, I refer Independent variable as Feature variable and Dependent variable as Target variable. These variables are also recognized by different names as follows: -\n",
"\n",
"\n",
"**Independent variable**\n",
"\n",
"Independent variable is also called Input variable and is denoted by X. In practical applications, independent variable is also called Feature variable or Predictor variable. We can denote it as: -\n",
"\n",
"\n",
"Independent or Input variable (X) = Feature variable = Predictor variable \n",
"\n",
"\n",
"**Dependent variable**\n",
"\n",
"\n",
"Dependent variable is also called Output variable and is denoted by y. Dependent variable is also called Target variable or Response variable. It can be denoted it as follows: -\n",
"\n",
"\n",
"Dependent or Output variable (y) = Target variable = Response variable\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Assumptions of Linear Regression\n",
"\n",
"The Linear Regression model is based on several assumptions which are as follows:-\n",
"\n",
"\n",
"1. Linear relationship\n",
"\n",
"2. Multivariate normality\n",
"\n",
"3. No or little multi-collinearity\n",
"\n",
"4. No auto-correlation in error terms\n",
"\n",
"5. Homoscedasticity\n",
"\n",
"\n",
"\n",
"I have described these assumptions in more detail in readme document.\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Dataset description\n",
"\n",
"\n",
"Now, we should get to know more about the dataset. It is a computer hardware dataset. The dataset consists of information about the computer vendors selling computers, model name of computers and various attributes to estimate the relative performance of CPU.\n",
"The dataset can be found at the following url –\n",
"\n",
"https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
"\n",
"\n",
"The dataset description will help us to know more about the data.\n",
"\n",
"\n",
"**Dataset description** is given as follows:-\n",
"\n",
"\n",
"1. vendor name: 30 \n",
" (adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec, \n",
" dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson, \n",
" microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry, \n",
" sratus, wang)\n",
" \n",
"2. Model Name: many unique symbols\n",
"\n",
"3. MYCT: machine cycle time in nanoseconds (integer)\n",
"\n",
"4. MMIN: minimum main memory in kilobytes (integer)\n",
"\n",
"5. MMAX: maximum main memory in kilobytes (integer)\n",
"\n",
"6. CACH: cache memory in kilobytes (integer)\n",
"\n",
"7. CHMIN: minimum channels in units (integer)\n",
"\n",
"8. CHMAX: maximum channels in units (integer)\n",
"\n",
"9. PRP: published relative performance (integer)\n",
"\n",
"10. ERP: estimated relative performance from the original article (integer)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. The problem statement\n",
"\n",
"\n",
"A machine learning model is built with the aim of solving a problem. So, first of all I have to define the problem to be solved in this project.\n",
"\n",
"As described earlier, the problem is to estimate the relative CPU performance of computer hardware dataset. Relative CPU performance of the computer hardware is described in terms of machine cycle time, main memory, cache memory and minimum and maximum channels as given in the dataset.\n",
"\n",
"So, let's get started. I will start by importing the required Python libraries."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Import the Python libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# import required Python libraries\n",
"\n",
"# to handle datasets\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# for plotting\n",
"import matplotlib.pyplot as plt\n",
"% matplotlib inline\n",
"import seaborn as sns\n",
"\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Import the dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# import the dataset\n",
"\n",
"filename = \"c:/datasets/machine.data.csv\"\n",
"\n",
"df = pd.read_csv(filename, header = None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. Exploratory Data Analysis\n",
"\n",
"\n",
"Now, I will perform Exploratory Data Analysis. It provides useful insights into the dataset which is important for further analysis.\n",
"\n",
"First of all, we should check the dimensions of the dataframe as follows:-"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of dataframe df: (209, 10)\n"
]
}
],
"source": [
"# view the dimensions of dataframe df\n",
"\n",
"print(\"Shape of dataframe df: {}\".format(df.shape))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that there are 209 rows and 10 columns in the dataset. Next, we should get an insight about the dataset.\n",
"\n",
"The **df.head()** function helps us to visualize the first 5 rows of the dataset."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>adviser</td>\n",
" <td>32/60</td>\n",
" <td>125</td>\n",
" <td>256</td>\n",
" <td>6000</td>\n",
" <td>256</td>\n",
" <td>16</td>\n",
" <td>128</td>\n",
" <td>198</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>269</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7a</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>220</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7b</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>172</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7c</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>16000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>16</td>\n",
" <td>132</td>\n",
" <td>132</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9\n",
"0 adviser 32/60 125 256 6000 256 16 128 198 199\n",
"1 amdahl 470v/7 29 8000 32000 32 8 32 269 253\n",
"2 amdahl 470v/7a 29 8000 32000 32 8 32 220 253\n",
"3 amdahl 470v/7b 29 8000 32000 32 8 32 172 253\n",
"4 amdahl 470v/7c 29 8000 16000 32 8 16 132 132"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view the top five rows of dataframe df\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the column names are from 0 to 9. They should be descriptive. So, we should rename them as follows:-"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# rename columns of dataframe df\n",
"\n",
"col_names = ['Vendor Name','Model Name', 'MYCT', 'MMIN', 'MMAX', 'CACH','CHMIN', 'CHMAX', 'PRP', 'ERP' ]\n",
"\n",
"df.columns = col_names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We should now check that the columns have appropriate names."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Vendor Name</th>\n",
" <th>Model Name</th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" <th>PRP</th>\n",
" <th>ERP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>adviser</td>\n",
" <td>32/60</td>\n",
" <td>125</td>\n",
" <td>256</td>\n",
" <td>6000</td>\n",
" <td>256</td>\n",
" <td>16</td>\n",
" <td>128</td>\n",
" <td>198</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>269</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7a</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>220</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7b</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>172</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7c</td>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>16000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>16</td>\n",
" <td>132</td>\n",
" <td>132</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Vendor Name Model Name MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP\n",
"0 adviser 32/60 125 256 6000 256 16 128 198 199\n",
"1 amdahl 470v/7 29 8000 32000 32 8 32 269 253\n",
"2 amdahl 470v/7a 29 8000 32000 32 8 32 220 253\n",
"3 amdahl 470v/7b 29 8000 32000 32 8 32 172 253\n",
"4 amdahl 470v/7c 29 8000 16000 32 8 16 132 132"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view the top five rows of dataframe with column names renamed\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Explore types of variables\n",
"\n",
"\n",
"In this section, I will explore the types of variables in the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's view a concise summary of the dataframe with **df.info()** method."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 209 entries, 0 to 208\n",
"Data columns (total 10 columns):\n",
"Vendor Name 209 non-null object\n",
"Model Name 209 non-null object\n",
"MYCT 209 non-null int64\n",
"MMIN 209 non-null int64\n",
"MMAX 209 non-null int64\n",
"CACH 209 non-null int64\n",
"CHMIN 209 non-null int64\n",
"CHMAX 209 non-null int64\n",
"PRP 209 non-null int64\n",
"ERP 209 non-null int64\n",
"dtypes: int64(8), object(2)\n",
"memory usage: 16.4+ KB\n"
]
}
],
"source": [
"# view dataframe summary\n",
"\n",
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that there are categorical and numerical variables in the dataset. Numerical variables have data types int64 and categorical variables are those of type object.\n",
"\n",
"First, let's explore the categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 2 categorical variables\n"
]
}
],
"source": [
"# find categorical variables\n",
"\n",
"categorical = [col for col in df.columns if df[col].dtype=='O']\n",
"print('There are {} categorical variables'.format(len(categorical)))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Vendor Name', 'Model Name']\n"
]
}
],
"source": [
"# view the categorical variables\n",
"\n",
"print(categorical)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, there are two categorical variables - **Vendor Name** and **Model Name** in the dataset. \n",
"\n",
"Let's explore more about them. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Vendor Name</th>\n",
" <th>Model Name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>adviser</td>\n",
" <td>32/60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7b</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>amdahl</td>\n",
" <td>470v/7c</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Vendor Name Model Name\n",
"0 adviser 32/60\n",
"1 amdahl 470v/7\n",
"2 amdahl 470v/7a\n",
"3 amdahl 470v/7b\n",
"4 amdahl 470v/7c"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view the top five rows of categorical variables\n",
"\n",
"df[categorical].head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ibm 32\n",
"nas 19\n",
"ncr 13\n",
"honeywell 13\n",
"sperry 13\n",
"siemens 12\n",
"amdahl 9\n",
"cdc 9\n",
"burroughs 8\n",
"harris 7\n",
"dg 7\n",
"hp 7\n",
"ipl 6\n",
"c.r.d 6\n",
"dec 6\n",
"magnuson 6\n",
"formation 5\n",
"cambex 5\n",
"prime 5\n",
"nixdorf 3\n",
"gould 3\n",
"perkin-elmer 3\n",
"apollo 2\n",
"wang 2\n",
"bti 2\n",
"basf 2\n",
"microdata 1\n",
"adviser 1\n",
"four-phase 1\n",
"sratus 1\n",
"Name: Vendor Name, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# exploring the categories in Vendor Name column\n",
"\n",
"df['Vendor Name'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ibm is the most frequent category in the **Vendor Name** column.\n",
"\n",
"Next, let's explore the **Model Name** column."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of unique Model Names: 209\n",
"Number of instances of models: 209\n"
]
}
],
"source": [
"print('Number of unique Model Names: ', len(df['Model Name'].unique()))\n",
"print('Number of instances of models: ', len(df))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that **Model Name** is a unique identifier for each of the computer models. Thus this is not a variable that we can use to predict the estimated relative performance of computer models. So, we should not use this column for model building.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's explore the numerical variables."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 8 numerical variables\n"
]
}
],
"source": [
"# find numerical variables\n",
"\n",
"numerical = [col for col in df.columns if df[col].dtype!='O']\n",
"print('There are {} numerical variables'.format(len(numerical)))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['MYCT', 'MMIN', 'MMAX', 'CACH', 'CHMIN', 'CHMAX', 'PRP', 'ERP']\n"
]
}
],
"source": [
"# view numerical variables\n",
"\n",
"print(numerical)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, there are eight numerical variables in the dataset. Let's explore more about them."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" <th>PRP</th>\n",
" <th>ERP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>125</td>\n",
" <td>256</td>\n",
" <td>6000</td>\n",
" <td>256</td>\n",
" <td>16</td>\n",
" <td>128</td>\n",
" <td>198</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>269</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>220</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>172</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>16000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>16</td>\n",
" <td>132</td>\n",
" <td>132</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP\n",
"0 125 256 6000 256 16 128 198 199\n",
"1 29 8000 32000 32 8 32 269 253\n",
"2 29 8000 32000 32 8 32 220 253\n",
"3 29 8000 32000 32 8 32 172 253\n",
"4 29 8000 16000 32 8 16 132 132"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view the top 5 rows of numerical variables\n",
"\n",
"df[numerical].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that we have eight numerical variables in the dataset. All the eight numerical variables are of discrete type.\n",
"\n",
"On closer inspection, we find that **PRP** is a redundant column in the dataframe. It denotes **published relative performance**. Our target is to predict **estimated relative performance**. So, we should delete **PRP** from the dataframe."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Summary : types of variables**\n",
"\n",
"\n",
"- There are 2 categorical variables and 8 numerical variables.\n",
"\n",
"- The 2 categorical variables, **Vendor Name** and **Model Name** are 2 non-predictive attributes as given in the dataset description. So, I do not use them for model building.\n",
"\n",
"- All of the 8 numerical variables are of discrete type.\n",
"\n",
"- Out of the 8 numerical variables, **PRP** is the linear regression's guess. It is redundant column. I do not use it for model building.\n",
"\n",
"- **ERP** (estimated relative performance is the goal field). It is the target variable."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estimate correlation coefficients\n",
"\n",
"\n",
"Our dataset is very small. So, we can compute the standard correlation coefficient (also called Pearson's r) between every pair of attributes. \n",
"\n",
"We can compute it using the `df.corr()` method as follows:-"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" <th>PRP</th>\n",
" <th>ERP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>MYCT</th>\n",
" <td>1.0000</td>\n",
" <td>-0.3356</td>\n",
" <td>-0.3786</td>\n",
" <td>-0.3210</td>\n",
" <td>-0.3011</td>\n",
" <td>-0.2505</td>\n",
" <td>-0.3071</td>\n",
" <td>-0.2884</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MMIN</th>\n",
" <td>-0.3356</td>\n",
" <td>1.0000</td>\n",
" <td>0.7582</td>\n",
" <td>0.5347</td>\n",
" <td>0.5172</td>\n",
" <td>0.2669</td>\n",
" <td>0.7949</td>\n",
" <td>0.8193</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MMAX</th>\n",
" <td>-0.3786</td>\n",
" <td>0.7582</td>\n",
" <td>1.0000</td>\n",
" <td>0.5380</td>\n",
" <td>0.5605</td>\n",
" <td>0.5272</td>\n",
" <td>0.8630</td>\n",
" <td>0.9012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CACH</th>\n",
" <td>-0.3210</td>\n",
" <td>0.5347</td>\n",
" <td>0.5380</td>\n",
" <td>1.0000</td>\n",
" <td>0.5822</td>\n",
" <td>0.4878</td>\n",
" <td>0.6626</td>\n",
" <td>0.6486</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CHMIN</th>\n",
" <td>-0.3011</td>\n",
" <td>0.5172</td>\n",
" <td>0.5605</td>\n",
" <td>0.5822</td>\n",
" <td>1.0000</td>\n",
" <td>0.5483</td>\n",
" <td>0.6089</td>\n",
" <td>0.6106</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CHMAX</th>\n",
" <td>-0.2505</td>\n",
" <td>0.2669</td>\n",
" <td>0.5272</td>\n",
" <td>0.4878</td>\n",
" <td>0.5483</td>\n",
" <td>1.0000</td>\n",
" <td>0.6052</td>\n",
" <td>0.5922</td>\n",
" </tr>\n",
" <tr>\n",
" <th>PRP</th>\n",
" <td>-0.3071</td>\n",
" <td>0.7949</td>\n",
" <td>0.8630</td>\n",
" <td>0.6626</td>\n",
" <td>0.6089</td>\n",
" <td>0.6052</td>\n",
" <td>1.0000</td>\n",
" <td>0.9665</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ERP</th>\n",
" <td>-0.2884</td>\n",
" <td>0.8193</td>\n",
" <td>0.9012</td>\n",
" <td>0.6486</td>\n",
" <td>0.6106</td>\n",
" <td>0.5922</td>\n",
" <td>0.9665</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP\n",
"MYCT 1.0000 -0.3356 -0.3786 -0.3210 -0.3011 -0.2505 -0.3071 -0.2884\n",
"MMIN -0.3356 1.0000 0.7582 0.5347 0.5172 0.2669 0.7949 0.8193\n",
"MMAX -0.3786 0.7582 1.0000 0.5380 0.5605 0.5272 0.8630 0.9012\n",
"CACH -0.3210 0.5347 0.5380 1.0000 0.5822 0.4878 0.6626 0.6486\n",
"CHMIN -0.3011 0.5172 0.5605 0.5822 1.0000 0.5483 0.6089 0.6106\n",
"CHMAX -0.2505 0.2669 0.5272 0.4878 0.5483 1.0000 0.6052 0.5922\n",
"PRP -0.3071 0.7949 0.8630 0.6626 0.6089 0.6052 1.0000 0.9665\n",
"ERP -0.2884 0.8193 0.9012 0.6486 0.6106 0.5922 0.9665 1.0000"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# estimate correlation coefficients\n",
"\n",
"pd.options.display.float_format = '{:,.4f}'.format\n",
"corr_matrix = df.corr()\n",
"corr_matrix"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ERP 1.0000\n",
"PRP 0.9665\n",
"MMAX 0.9012\n",
"MMIN 0.8193\n",
"CACH 0.6486\n",
"CHMIN 0.6106\n",
"CHMAX 0.5922\n",
"MYCT -0.2884\n",
"Name: ERP, dtype: float64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corr_matrix['ERP'].sort_values(ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Interpretation of correlation coefficient**\n",
"\n",
"The correlation coefficient ranges from -1 to +1. \n",
"\n",
"When it is close to +1, this signifies that there is a strong positive correlation. So, we can see that there is a strong positive correlation between `ERP` and `MMAX`. \n",
"\n",
"\n",
"When it is clsoe to -1, it means that there is a strong negative correlation. So, there is a small negative correlation between `ERP` and `MYCT`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Correlation heat map"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16,10))\n",
"plt.title('Correlation of Attributes with ERP')\n",
"a = sns.heatmap(corr_matrix, square=True, annot=True, fmt='.2f', linecolor='white')\n",
"a.set_xticklabels(a.get_xticklabels(), rotation=90)\n",
"a.set_yticklabels(a.get_yticklabels(), rotation=30) \n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that **ERP** is positively correlated with **MMIN**, **MMAX**, **CACH**, **CHMIN** and **CHMAX**.\n",
"\n",
"Also, there is a strong positive correlation between **ERP** and **MMIN** and also between **ERP** and **MMAX**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. Detect problems within variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Detect missing values"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Vendor Name 0\n",
"Model Name 0\n",
"MYCT 0\n",
"MMIN 0\n",
"MMAX 0\n",
"CACH 0\n",
"CHMIN 0\n",
"CHMAX 0\n",
"PRP 0\n",
"ERP 0\n",
"dtype: int64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's visualise the number of missing values\n",
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can confirm that there are no missing values in the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Outliers in discrete variables"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" <th>PRP</th>\n",
" <th>ERP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" <td>209.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>203.8230</td>\n",
" <td>2,867.9809</td>\n",
" <td>11,796.1531</td>\n",
" <td>25.2057</td>\n",
" <td>4.6986</td>\n",
" <td>18.2679</td>\n",
" <td>105.6220</td>\n",
" <td>99.3301</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>260.2629</td>\n",
" <td>3,878.7428</td>\n",
" <td>11,726.5644</td>\n",
" <td>40.6287</td>\n",
" <td>6.8163</td>\n",
" <td>25.9973</td>\n",
" <td>160.8307</td>\n",
" <td>154.7571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>17.0000</td>\n",
" <td>64.0000</td>\n",
" <td>64.0000</td>\n",
" <td>0.0000</td>\n",
" <td>0.0000</td>\n",
" <td>0.0000</td>\n",
" <td>6.0000</td>\n",
" <td>15.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>50.0000</td>\n",
" <td>768.0000</td>\n",
" <td>4,000.0000</td>\n",
" <td>0.0000</td>\n",
" <td>1.0000</td>\n",
" <td>5.0000</td>\n",
" <td>27.0000</td>\n",
" <td>28.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>110.0000</td>\n",
" <td>2,000.0000</td>\n",
" <td>8,000.0000</td>\n",
" <td>8.0000</td>\n",
" <td>2.0000</td>\n",
" <td>8.0000</td>\n",
" <td>50.0000</td>\n",
" <td>45.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>225.0000</td>\n",
" <td>4,000.0000</td>\n",
" <td>16,000.0000</td>\n",
" <td>32.0000</td>\n",
" <td>6.0000</td>\n",
" <td>24.0000</td>\n",
" <td>113.0000</td>\n",
" <td>101.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1,500.0000</td>\n",
" <td>32,000.0000</td>\n",
" <td>64,000.0000</td>\n",
" <td>256.0000</td>\n",
" <td>52.0000</td>\n",
" <td>176.0000</td>\n",
" <td>1,150.0000</td>\n",
" <td>1,238.0000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX \\\n",
"count 209.0000 209.0000 209.0000 209.0000 209.0000 209.0000 \n",
"mean 203.8230 2,867.9809 11,796.1531 25.2057 4.6986 18.2679 \n",
"std 260.2629 3,878.7428 11,726.5644 40.6287 6.8163 25.9973 \n",
"min 17.0000 64.0000 64.0000 0.0000 0.0000 0.0000 \n",
"25% 50.0000 768.0000 4,000.0000 0.0000 1.0000 5.0000 \n",
"50% 110.0000 2,000.0000 8,000.0000 8.0000 2.0000 8.0000 \n",
"75% 225.0000 4,000.0000 16,000.0000 32.0000 6.0000 24.0000 \n",
"max 1,500.0000 32,000.0000 64,000.0000 256.0000 52.0000 176.0000 \n",
"\n",
" PRP ERP \n",
"count 209.0000 209.0000 \n",
"mean 105.6220 99.3301 \n",
"std 160.8307 154.7571 \n",
"min 6.0000 15.0000 \n",
"25% 27.0000 28.0000 \n",
"50% 50.0000 45.0000 \n",
"75% 113.0000 101.0000 \n",
"max 1,150.0000 1,238.0000 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's view the summary statistics of the dataset\n",
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"50 0.1196\n",
"140 0.0431\n",
"300 0.0383\n",
"26 0.0383\n",
"38 0.0335\n",
"320 0.0335\n",
"56 0.0335\n",
"180 0.0335\n",
"800 0.0287\n",
"75 0.0287\n",
"105 0.0287\n",
"200 0.0287\n",
"143 0.0239\n",
"900 0.0239\n",
"160 0.0239\n",
"400 0.0191\n",
"60 0.0191\n",
"29 0.0191\n",
"25 0.0191\n",
"23 0.0191\n",
"110 0.0191\n",
"92 0.0144\n",
"100 0.0144\n",
"250 0.0144\n",
"115 0.0144\n",
"125 0.0144\n",
"30 0.0144\n",
"480 0.0144\n",
"225 0.0144\n",
"330 0.0144\n",
"810 0.0096\n",
"1500 0.0096\n",
"72 0.0096\n",
"40 0.0096\n",
"57 0.0096\n",
"59 0.0096\n",
"17 0.0096\n",
"133 0.0096\n",
"1100 0.0096\n",
"240 0.0096\n",
"700 0.0096\n",
"64 0.0048\n",
"220 0.0048\n",
"203 0.0048\n",
"185 0.0048\n",
"175 0.0048\n",
"167 0.0048\n",
"35 0.0048\n",
"150 0.0048\n",
"116 0.0048\n",
"124 0.0048\n",
"70 0.0048\n",
"48 0.0048\n",
"112 0.0048\n",
"52 0.0048\n",
"98 0.0048\n",
"350 0.0048\n",
"600 0.0048\n",
"84 0.0048\n",
"90 0.0048\n",
"Name: MYCT, dtype: float64\n",
"\n",
"2000 0.2584\n",
"1000 0.1818\n",
"4000 0.1053\n",
"512 0.1053\n",
"8000 0.0957\n",
"256 0.0622\n",
"768 0.0478\n",
"16000 0.0335\n",
"262 0.0096\n",
"3100 0.0096\n",
"5240 0.0096\n",
"1310 0.0096\n",
"2620 0.0096\n",
"384 0.0096\n",
"32000 0.0048\n",
"1500 0.0048\n",
"524 0.0048\n",
"64 0.0048\n",
"192 0.0048\n",
"96 0.0048\n",
"3000 0.0048\n",
"500 0.0048\n",
"5000 0.0048\n",
"128 0.0048\n",
"2300 0.0048\n",
"Name: MMIN, dtype: float64\n",
"\n",
"8000 0.2057\n",
"16000 0.1675\n",
"4000 0.1579\n",
"32000 0.1100\n",
"2000 0.0813\n",
"12000 0.0478\n",
"1000 0.0335\n",
"6000 0.0287\n",
"3000 0.0239\n",
"5000 0.0239\n",
"24000 0.0191\n",
"64000 0.0191\n",
"6200 0.0144\n",
"2620 0.0096\n",
"10480 0.0096\n",
"512 0.0096\n",
"20970 0.0096\n",
"768 0.0048\n",
"64 0.0048\n",
"6300 0.0048\n",
"3500 0.0048\n",
"1500 0.0048\n",
"4500 0.0048\n",
"Name: MMAX, dtype: float64\n",
"\n",
"0 0.3301\n",
"8 0.1483\n",
"32 0.1100\n",
"64 0.0957\n",
"16 0.0670\n",
"4 0.0383\n",
"24 0.0335\n",
"128 0.0287\n",
"6 0.0239\n",
"2 0.0191\n",
"30 0.0191\n",
"1 0.0096\n",
"9 0.0096\n",
"256 0.0096\n",
"48 0.0096\n",
"65 0.0096\n",
"112 0.0096\n",
"131 0.0096\n",
"12 0.0048\n",
"142 0.0048\n",
"96 0.0048\n",
"160 0.0048\n",
"Name: CACH, dtype: float64\n",
"\n",
"1 0.4498\n",
"3 0.1340\n",
"8 0.0861\n",
"6 0.0766\n",
"12 0.0526\n",
"16 0.0478\n",
"4 0.0383\n",
"5 0.0335\n",
"2 0.0287\n",
"0 0.0239\n",
"52 0.0096\n",
"32 0.0048\n",
"26 0.0048\n",
"24 0.0048\n",
"7 0.0048\n",
"Name: CHMIN, dtype: float64\n",
"\n",
"6 0.1435\n",
"24 0.1148\n",
"8 0.0957\n",
"32 0.0718\n",
"5 0.0622\n",
"16 0.0574\n",
"2 0.0526\n",
"4 0.0526\n",
"1 0.0478\n",
"3 0.0431\n",
"12 0.0383\n",
"20 0.0239\n",
"0 0.0239\n",
"64 0.0239\n",
"10 0.0191\n",
"14 0.0191\n",
"38 0.0144\n",
"54 0.0144\n",
"7 0.0096\n",
"176 0.0096\n",
"128 0.0096\n",
"104 0.0096\n",
"13 0.0048\n",
"15 0.0048\n",
"26 0.0048\n",
"28 0.0048\n",
"31 0.0048\n",
"48 0.0048\n",
"52 0.0048\n",
"112 0.0048\n",
"19 0.0048\n",
"Name: CHMAX, dtype: float64\n",
"\n"
]
}
],
"source": [
"# outlies in discrete variables\n",
"\n",
"for var in ['MYCT', 'MMIN', 'MMAX', 'CACH', 'CHMIN', 'CHMAX']:\n",
" print(df[var].value_counts() / np.float(len(df)))\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA7kAAAJmCAYAAACZuEeOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3Xu4r3VdJ/z3R1DBA4SyO3EQJEoxDG2Lo06ap0Qt8ClRshIPRZY2PkM5oZkaHYZsdGq6cJQSL/JJ8VRJutM8MvWUwlZRROQRSWVHJh7REUHk8/zx++1xud2H31p73Wstvr5e1/W71n3eb2Bd7PVe3/v+3tXdAQAAgBHcar0DAAAAwGpRcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBYB1VFVPqKqtVfWVqvq3qvq7qvqPS/Y/qaq6qh63k3MPqKo/rqpPzc+/cr5+8Hz/J6rqYTuc86Sq+sfp/8kAYH0ouQCwTqrq9CR/nOQPknxPksOTvCTJSUsOOzXJ5+dfl557myTvSHKPJCckOSDJ/ZN8LsnxU2cHgI1q3/UOAADfiarqwCRnJnlyd//Vkl1/O/+kqu6S5EFJTk7ymqr6nu7+9/lxT8ysFD+4u78y3/aZJL+7FvkBYKMykgsA6+N+SfZL8te7OeaJSbZ29xuSXJ7k55bse1iStywpuABAlFwAWC93TvLZ7r5pN8c8Mcmr5suvyrfesnznJP+2wJ/zN1X1xe2fzG6HBoBhKbkAsD4+l+Tgqtrpo0NV9YAkRyY5f77pVUmOrarjlpz/fQv8OY/p7u/a/knyq3uZGwA2NCUXANbHPyf5WpLH7GL/qUkqySVV9ekk751vf+L869uTPKKqbj9pSgC4hVFyAWAddPeXkjwvydlV9Ziqul1V3bqqHllVL0zyuCSnJTluyefXkvzcfPT3lUmuTvKGqrpbVd2qqu5cVc+pqketzz8VAKw/JRcA1kl3vzjJ6Umem+TazErrM5J8Mcn1Sf6iuz+9/ZPk5Un2SXJCd9+Q2eRTH03ytiTXJbkoycH55qgvAHzHqe5e7wwAAACwKozkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADD2He9A6yWgw8+uI844oj1jgEAAMAE3ve+9322uzft6bhhSu4RRxyRrVu3rncMAAAAJlBVn1zkOLcrAwAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhrHvegdYD0ec8eZVvd4nznr0ql4PAACAlTGSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIYxacmtqhOq6oqqurKqztjJ/qdV1aVVdUlV/WNVHbNk37Pn511RVY+YMicAAABjmKzkVtU+Sc5O8sgkxyT52aUldu5V3X1sdx+X5IVJXjw/95gkpyS5R5ITkrxkfj0AAADYpSlHco9PcmV3X9XdNyY5P8lJSw/o7uuWrN4+Sc+XT0pyfnff0N3/kuTK+fUAAABgl/ad8NqHJLl6yfq2JPfd8aCqenqS05PcJslDlpz7nh3OPWSamAAAAIxiypHc2sm2/rYN3Wd391FJfjPJc5dzblWdVlVbq2rrtddeu1dhAQAAuOWbsuRuS3LYkvVDk1yzm+PPT/KY5Zzb3ed09+bu3rxp06a9jAsAAMAt3ZQl9+IkR1fVkVV1m8wmkrpg6QFVdfSS1Ucn+dh8+YIkp1TVbavqyCRHJ7lowqwAAAAMYLJncrv7pqp6RpK3JtknybndfVlVnZlka3dfkOQZVfWwJF9P8oUkp87PvayqXpvkI0luSvL07v7GVFkBAAAYw5QTT6W7tyTZssO25y1ZfuZuzv39JL8/XToAAABGM+XtygAAALCmlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBiTltyqOqGqrqiqK6vqjJ3sP72qPlJVH6qqd1TVXZbs+0ZVXTL/XDBlTgAAAMaw71QXrqp9kpyd5OFJtiW5uKou6O6PLDnsA0k2d/dXq+pXkrwwyePn+67v7uOmygcAAMB4phzJPT7Jld19VXffmOT8JCctPaC739XdX52vvifJoRPmAQAAYHBTltxDkly9ZH3bfNuuPDXJ3y1Z36+qtlbVe6rqMVMEBAAAYCyT3a6cpHayrXd6YNXPJ9mc5EFLNh/e3ddU1V2TvLOqLu3uj+9w3mlJTkuSww8/fHVSAwAAcIs15UjutiSHLVk/NMk1Ox5UVQ9L8ltJTuzuG7Zv7+5r5l+vSvLuJPfa8dzuPqe7N3f35k2bNq1uegAAAG5xpiy5Fyc5uqqOrKrbJDklybfMklxV90rysswK7meWbD+oqm47Xz44yQOSLJ2wCgAAAL7NZLcrd/dNVfWMJG9Nsk+Sc7v7sqo6M8nW7r4gyR8luUOS11VVknyqu09McvckL6uqmzMr4mftMCszAAAAfJspn8lNd29JsmWHbc9bsvywXZz3T0mOnTIbAAAA45nydmUAAABYU0ouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGPvuakdVXZqkd7YrSXf3PSdLBQAAACuwy5Kb5CfXLAUAAACsgl2W3O7+5PblqrpLkqO7++1Vtf/uzgMAAID1ssdncqvql5K8PsnL5psOTfI3U4YCAACAlVhk4qmnJ3lAkuuSpLs/luS7pwwFAAAAK7FIyb2hu2/cvlJV+2bnE1IBAADAulqk5F5YVc9Jsn9VPTzJ65L87bSxAAAAYPkWKblnJLk2yaVJfjnJliTPnTIUAAAArMQeZ0nu7pur6rwk783sNuUrutvtygAAAGw4eyy5VfXoJC9N8vEkleTIqvrl7v67qcMBAADAcizyvtsXJXlwd1+ZJFV1VJI3J1FyAQAA2FAWeSb3M9sL7txVST4zUR4AAABYsV2O5FbVT88XL6uqLUlem9kzuScnuXgNsgEAAMCy7O525Z9asvzvSR40X742yUGTJQIAAIAV2mXJ7e4nr2UQAAAA2FuLzK68X5KnJrlHkv22b+/up0yYCwAAAJZtkYmnXpnke5M8IsmFSQ5N8uUpQwEAAMBKLFJyf6C7fzvJ/+7u85I8Osmx08YCAACA5Vuk5H59/vWLVfXDSQ5McsRkiQAAAGCF9vhMbpJzquqgJL+d5IIkd0jyvElTAQAAwArsseR295/PFy9Mctdp4wAAAMDK7bLkVtXpuzuxu1+8+nEAAABg5XY3knvHNUsBAAAAq2CXJbe7f2ctgwAAAMDeWmR2ZQAAALhFUHIBAAAYxm5LblXdqqoet1ZhAAAAYG/stuR2981JnrFGWQAAAGCvLHK78tuq6jeq6rCqutP2z+TJAAAAYJl29wqh7Z4y//r0Jds6yV1XPw4AAACs3B5LbncfuRZBAAAAYG/t8XblqrpdVT23qs6Zrx9dVT85fTQAAABYnkWeyX1FkhuT3H++vi3J702WCAAAAFZokZJ7VHe/MMnXk6S7r09Sk6YCAACAFVik5N5YVftnNtlUquqoJDdMmgoAAABWYJHZlZ+f5C1JDquqv0zygCRPmjIUAAAArMQisyu/raren+Q/ZHab8jO7+7OTJwMAAIBlWmQkN0kelOQ/ZnbL8q2T/PVkiQAAAGCFFnmF0EuSPC3JpUk+nOSXq+rsqYMBAADAci0ykvugJD/c3dsnnjovs8ILAAAAG8oisytfkeTwJeuHJfnQNHEAAABg5RYZyb1zksur6qL5+n2S/HNVXZAk3X3iVOEAAABgORYpuc+bPAUAAACsgkVeIXThWgQBAACAvbXIM7krVlUnVNUVVXVlVZ2xk/2nV9VHqupDVfWOqrrLkn2nVtXH5p9Tp8wJAADAGCYruVW1T5KzkzwyyTFJfraqjtnhsA8k2dzd90zy+iQvnJ97pyTPT3LfJMcneX5VHTRVVgAAAMawrJJbVQdV1T0XPPz4JFd291XdfWOS85OctPSA7n5Xd391vvqeJIfOlx+R5G3d/fnu/kKStyU5YTlZAQAA+M6zx5JbVe+uqgPmo6sfTPKKqnrxAtc+JMnVS9a3zbftylOT/N0KzwUAAICFRnIP7O7rkvx0kld0948medgC59VOtvVOD6z6+SSbk/zRcs6tqtOqamtVbb322msXiAQAAMDIFim5+1bV9yV5XJI3LePa25IctmT90CTX7HhQVT0syW8lObG7b1jOud19Tndv7u7NmzZtWkY0AAAARrRIyT0zyVsze7724qq6a5KPLXDexUmOrqojq+o2SU5JcsHSA6rqXklellnB/cySXW9N8hPzZ4APSvIT820AAACwS4u8J/d1SV63ZP2qJD+zwHk3VdUzMiun+yQ5t7svq6ozk2zt7gsyuz35DkleV1VJ8qnuPrG7P19Vv5tZUU6SM7v788v8ZwMAAOA7zB5LblVtSvJLSY5Yenx3P2VP53b3liRbdtj2vCXLu3y2t7vPTXLunv4MAAAA2G6PJTfJG5P8Q5K3J/nGtHEAAABg5RYpubfr7t+cPAkAAADspUUmnnpTVT1q8iQAAACwlxYpuc/MrOh+raq+PP9cN3UwAAAAWK5FZle+41oEAQAAgL21yDO5qaoTkzxwvvru7n7TdJEAAABgZfZ4u3JVnZXZLcsfmX+eOd8GAAAAG8oiI7mPSnJcd9+cJFV1XpIPJDljymAAAACwXItMPJUk37Vk+cApggAAAMDeWmQk978m+UBVvStJZfZs7rMnTQUAAAArsMjsyq+uqncnuU9mJfc3u/vTUwcDAACA5drl7cpVdbf513sn+b4k25JcneT759sAAABgQ9ndSO7pSU5L8qKd7OskD5kkEQAAAKzQLktud582X3xkd39t6b6q2m/SVAAAALACi8yu/E8LbgMAAIB1tcuR3Kr63iSHJNm/qu6V2aRTSXJAktutQTYAAABYlt09k/uIJE9KcmiSFy/Z/uUkz5kwEwAAAKzI7p7JPS/JeVX1M939hjXMBAAAACuyyHty31BVj05yjyT7Ldl+5pTBAAAAYLn2OPFUVb00yeOT/Fpmz+WenOQuE+cCAACAZVtkduX7d/cTk3yhu38nyf2SHDZtLAAAAFi+RUru9fOvX62q70/y9SRHThcJAAAAVmaPz+QmeVNVfVeSP0ry/iSd5M8mTQUAAAArsMjEU787X3xDVb0pyX7d/aVpYwEAAMDyLTLx1Aer6jlVdVR336DgAgAAsFEt8kzuiUluSvLaqrq4qn6jqg6fOBcAAAAs2x5Lbnd/srtf2N0/muQJSe6Z5F8mTwYAAADLtMjEU6mqI5I8LrP35X4jyX+ZLhIAAACszB5LblW9N8mtk7w2ycndfdXkqQAAAGAFdltyq+pWSf66u89aozwAAACwYrt9Jre7b07yqDXKAgAAAHtlkdmV3zafUfmwqrrT9s/kyQAAAGCZFpl46inzr09fsq2T3HX14wAAAMDK7bHkdveRaxEEAAAA9tYeb1euqttV1XOr6pz5+tFV9ZPTRwMAAIDlWeSZ3FckuTHJ/efr25L83mSJAAAAYIUWKblHdfcLk3w9Sbr7+iQ1aSoAAABYgUVK7o1VtX9mk02lqo5KcsOkqQAAAGAFFpld+flJ3pLksKr6yyQPSPKkKUMBAADASiwyu/Lbqur9Sf5DZrcpP7O7Pzt5MgAAAFimRWZXfkCSr3X3m5N8V5LnVNVdJk8GAAAAy7TIM7n/M8lXq+pHkjwrySeT/MWkqQAAAGAFFim5N3V3Jzkpyf/o7j9JcsdpYwEAAMDyLTLx1Jer6tlJfiHJj1XVPkluPW0sAAAAWL5FRnIfn9krg57S3Z9OckiSP5o0FQAAAKzAHkvuvNi+KslBVfVTSW7sbs/kAgAAsOEsMrvyLya5KMlPJ3lskvdU1VOmDgYAAADLtcgzuc9Kcq/u/lySVNWdk/xTknOnDAYAAADLtcgzuduSfHnJ+peTXD1NHAAAAFi5XY7kVtXp88V/TfLeqnpjku2vErpoDbIBAADAsuzuduXt78L9+Pyz3RuniwMAAAArt8uS292/s3S9qu4429xfmTwVAAAArMAisyv/cFV9IMmHk1xWVe+rqntMHw0AAACWZ5GJp85Jcnp336W775Lk15P82bSxAAAAYPkWKbm37+53bV/p7ncnuf1kiQAAAGCFFnlP7lVV9dtJXjlf//kk/zJdJAAAAFiZRUZyn5JkU5K/mn8OTvLkKUMBAADASuxxJLe7v5DkP61BFgAAANgri4zkAgAAwC2CkgsAAMAwdllyq+oP519PXrs4AAAAsHK7G8l9VFXdOsmz1yoMAAAA7I3dTTz1liSfTXL7qrouSSXp7V+7+4A1yAcAAAAL2+VIbnc/q7sPTPLm7j6gu++49OsaZgQAAICFLPIKoZOq6nuS3Ge+6b3dfe20sQAAAGD59ji78nziqYuSnJzkcUkuqqrHTh0MAAAAlmuPI7lJnpvkPt39mSSpqk1J3p7k9VMGAwAAgOVa5D25t9pecOc+t+B5AAAAsKYWGcl9S1W9Ncmr5+uPT7JlukgAAACwMotMPPWsqvrpJP8xs9cHndPdfz15MgAAAFimhW477u6/6u7Tu/s/L6fgVtUJVXVFVV1ZVWfsZP8Dq+r9VXXTjpNZVdU3quqS+eeCRf9MAAAAvnMtcrvyilTVPknOTvLwJNuSXFxVF3T3R5Yc9qkkT0ryGzu5xPXdfdxU+QAAABjPZCU3yfFJruzuq5Kkqs5PclKS/1Nyu/sT8303T5gDAACA7xAL3a5cVftX1Q8t89qHJLl6yfq2+bZF7VdVW6vqPVX1mF3kOm1+zNZrr712mfEAAAAYzR5LblX9VJJLkrxlvn7cgs/I1k629TKyHd7dm5M8IckfV9VR33ax7nO6e3N3b960adMyLg0AAMCIFhnJfUFmtx5/MUm6+5IkRyxw3rYkhy1ZPzTJNYsG6+5r5l+vSvLuJPda9FwAAAC+My3yTO5N3f2lqp0NzO7WxUmOrqojk/xrklMyG5Xdo6o6KMlXu/uGqjo4yQOSvHC5AYC986LH/+SqXu/XX/OmVb0eAADsaJGR3A9X1ROS7FNVR1fVnyb5pz2d1N03JXlGkrcmuTzJa7v7sqo6s6pOTJKquk9VbUtycpKXVdVl89PvnmRrVX0wybuSnLXDrMwAAADwbRYZyf21JL+V5IYkr86stP7uIhfv7i1Jtuyw7XlLli/O7DbmHc/7pyTHLvJnAAAAwHZ7LLnd/dXMSu5vTR8HAAAAVm6PJbeq/jbfPivyl5JsTfKy7v7aFMEAAABguRZ5JveqJF9J8mfzz3VJ/j3JD87XAQAAYENY5Jnce3X3A5es/21V/a/ufuCSiaIAAABg3S0ykrupqg7fvjJfPni+euMkqQAAAGAFFhnJ/fUk/1hVH09SSY5M8qtVdfsk500ZDgAAAJZjkdmVt1TV0UnullnJ/eiSyab+eMpwAAAAsByLjOQmydFJfijJfknuWVXp7r+YLhYAAAAs3yKvEHp+kh9PckySLUkemeQfkyi5AAAAbCiLTDz12CQPTfLp7n5ykh9JcttJUwEAAMAKLFJyr+/um5PcVFUHJPlMkrtOGwsAAACWb5FncrdW1Xcl+bMk70vylSQXTZoKAAAAVmCR2ZV/db740qp6S5IDuvtD08YCAACA5dvj7cpV9Y7ty939ie7+0NJtAAAAsFHsciS3qvZLcrskB1fVQZm9IzdJDkjy/WuQDQAAAJZld7cr/3KS/zuzQvu+fLPkXpfk7IlzAQAAwLLtsuR2958k+ZOq+rXu/tM1zAQAAAArssjEU39aVfdPcsTS47v7LybMBQAAAMu2x5JbVa9MclSSS5J8Y765kyi5AAAAbCiLvCd3c5JjurunDgMAAAB7Y4+vEEry4STfO3UQAAAA2FuLjOQenOQjVXVRkhu2b+zuEydLBQAAACuwSMl9wdQhAAAAYDUsMrvyhVV1lyRHd/fbq+p2SfaZPhoAAAAszx6fya2qX0ry+iQvm286JMnfTBkKAAAAVmKRiaeenuQBSa5Lku7+WJLvnjIUAAAArMQiJfeG7r5x+0pV7ZvZe3IBAABgQ1mk5F5YVc9Jsn9VPTzJ65L87bSxAAAAYPkWKblnJLk2yaVJfjnJliTPnTIUAAAArMQirxDaP8m53f1nSVJV+8y3fXXKYAAAALBci4zkviOzUrvd/knePk0cAAAAWLlFSu5+3f2V7Svz5dtNFwkAAABWZpGS+7+r6t7bV6rqR5NcP10kAAAAWJlFnsl9ZpLXVdU18/XvS/L46SIBAADAyuy25FbVrZLcJsndkvxQkkry0e7++hpkAwAAgGXZbcnt7pur6kXdfb8kH16jTAAAALAiizyT+/dV9TNVVZOnAQAAgL2wyDO5pye5fZJvVNX1md2y3N19wKTJAAAAYJn2WHK7+45rEQQAAAD21h5vV66Zn6+q356vH1ZVx08fDQAAAJZnkWdyX5LkfkmeMF//SpKzJ0sEAAAAK7TIM7n37e57V9UHkqS7v1BVt5k4FwAAACzbIiO5X6+qfZJ0klTVpiQ3T5oKAAAAVmCRkvs/kvx1ku+uqt9P8o9J/mDSVAAAALACi8yu/JdV9b4kD83s9UGP6e7LJ08GAAAAy7TLkltV+yV5WpIfSHJpkpd1901rFQwAAACWa3e3K5+XZHNmBfeRSf7bmiQCAACAFdrd7crHdPexSVJVL09y0dpEAgAAgJXZ3Uju17cvuE0ZAACAW4LdjeT+SFVdN1+uJPvP1ytJd/cBk6cDAACAZdhlye3ufdYyCAAAAOytRd6TCwAAALcISi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMIzdvScXABjAEWe8eVWv94mzHr2q1wOA1WQkFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYk5bcqjqhqq6oqiur6oyd7H9gVb2/qm6qqsfusO/UqvrY/HPqlDkBAAAYw2Qlt6r2SXJ2kkcmOSbJz1bVMTsc9qkkT0ryqh3OvVOS5ye5b5Ljkzy/qg6aKisAAABjmHIk9/gkV3b3Vd19Y5Lzk5y09IDu/kR3fyjJzTuc+4gkb+vuz3f3F5K8LckJE2YFAABgAFOW3EOSXL1kfdt829TnAgAA8B1qypJbO9nWq3luVZ1WVVurauu11167rHAAAACMZ8qSuy3JYUvWD01yzWqe293ndPfm7t68adOmFQcFAABgDFOW3IuTHF1VR1bVbZKckuSCBc99a5KfqKqD5hNO/cR8GwAAAOzSZCW3u29K8ozMyunlSV7b3ZdV1ZlVdWKSVNV9qmpbkpOTvKyqLpuf+/kkv5tZUb44yZnzbQAAALBL+0558e7ekmTLDtuet2T54sxuRd7ZuecmOXfKfAAAAIxlytuVAQAAYE0puQAAAAxj0tuVAQBGcOx5x67q9S499dJVvR4A32QkFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADD2He9AwCwft7xzqNW/ZoPfcjHV/2aAACLMpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQX35re7AAAV9UlEQVQAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxj0pJbVSdU1RVVdWVVnbGT/betqtfM97+3qo6Ybz+iqq6vqkvmn5dOmRMAAIAx7DvVhatqnyRnJ3l4km1JLq6qC7r7I0sOe2qSL3T3D1TVKUn+MMnj5/s+3t3HTZUPAACA8Uw5knt8kiu7+6ruvjHJ+UlO2uGYk5KcN19+fZKHVlVNmAkAAICBTVlyD0ly9ZL1bfNtOz2mu29K8qUkd57vO7KqPlBVF1bVj02YEwAAgEFMdrtykp2NyPaCx/xbksO7+3NV9aNJ/qaq7tHd133LyVWnJTktSQ4//PBViAwAAMAt2ZQjuduSHLZk/dAk1+zqmKraN8mBST7f3Td09+eSpLvfl+TjSX5wxz+gu8/p7s3dvXnTpk0T/CMAAABwSzJlyb04ydFVdWRV3SbJKUku2OGYC5KcOl9+bJJ3dndX1ab5xFWpqrsmOTrJVRNmBQAAYACT3a7c3TdV1TOSvDXJPknO7e7LqurMJFu7+4IkL0/yyqq6MsnnMyvCSfLAJGdW1U1JvpHkad39+amyAgAAMIYpn8lNd29JsmWHbc9bsvy1JCfv5Lw3JHnDlNkAAAAYz5S3KwMAAMCaUnIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhpILAADAMJRcAAAAhqHkAgAAMAwlFwAAgGHsu94BYAqX3+3uq37Nu3/08lW/JgAAsLqM5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGEouAAAAw1ByAQAAGIZXCAGwoX3vuy5Z1et9+sHHrer1AICNxUguAAAAw1ByAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMw3tyAQDgFmDbGf+w6tc89KwfW/VrwnozkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBheIQTr5OynvXNVr/f0lz5kVa8HAAC3REZyAQAAGIaSCwAAwDCUXAAAAIah5AIAADAMJRcAAIBhKLkAAAAMQ8kFAABgGN6Ty4oce96xq3q9S0+9dFWvB7BWjjjjzat+zU+c9ehVvyYAfKcwkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBhKLgAAAMNQcgEAABiGkgsAAMAwlFwAAACGoeQCAAAwDCUXAACAYSi5AAAADEPJBQAAYBj7rncAduIFB67y9b60uteDDWLbGf+wqtc79KwfW9XrAcvg7769dvnd7r6q17v7Ry9f1evdErzo8T+5qtf79de8aVWvd0vwghe8YENfj9VxxBlvXtXrfeKsR6/q9YzkAgAAMAwlFwAAgGEouQAAAAxDyQUAAGAYSi4AAADDUHIBAAAYhlcIAUzIqxQAZs5+2jtX/ZpPf+lDVv2ajO8d7zxqVa/30Id8fFWv973vumRVr5ckn37wcat+zY3MSC4AAADDUHIBAAAYhpILAADAMCYtuVV1QlVdUVVXVtUZO9l/26p6zXz/e6vqiCX7nj3ffkVVPWLKnAAAAIxhspJbVfskOTvJI5Mck+Rnq+qYHQ57apIvdPcPJPnvSf5wfu4xSU5Jco8kJyR5yfx6AAAAsEtTjuQen+TK7r6qu29Mcn6Sk3Y45qQk582XX5/koVVV8+3nd/cN3f0vSa6cXw8AAAB2acqSe0iSq5esb5tv2+kx3X1Tki8lufOC5wIAAMC3qO6e5sJVJyd5RHf/4nz9F5Ic392/tuSYy+bHbJuvfzyzEdszk/xzd/8/8+0vT7Klu9+ww59xWpLT5qs/lOSKVf7HODjJZ1f5mqtpo+dLNn7GjZ4vkXE1bPR8ycbPuNHzJTKuho2eL9n4GTd6vmTjZ9zo+RIZV8NGz5ds/IwbPV+y+hnv0t2b9nTQvqv4B+5oW5LDlqwfmuSaXRyzrar2TXJgks8veG66+5wk56xi5m9RVVu7e/NU199bGz1fsvEzbvR8iYyrYaPnSzZ+xo2eL5FxNWz0fMnGz7jR8yUbP+NGz5fIuBo2er5k42fc6PmS9cs45e3KFyc5uqqOrKrbZDaR1AU7HHNBklPny49N8s6eDS1fkOSU+ezLRyY5OslFE2YFAABgAJON5Hb3TVX1jCRvTbJPknO7+7KqOjPJ1u6+IMnLk7yyqq7MbAT3lPm5l1XVa5N8JMlNSZ7e3d+YKisAAABjmPJ25XT3liRbdtj2vCXLX0ty8i7O/f0kvz9lvgVMdiv0Ktno+ZKNn3Gj50tkXA0bPV+y8TNu9HyJjKtho+dLNn7GjZ4v2fgZN3q+RMbVsNHzJRs/40bPl6xTxskmngIAAIC1NuUzuQAAALCmlFwAAACGMekzubckVXW3JCclOSRJZ/bKogu6+/J1DXYLM//3eEiS93b3V5ZsP6G737J+yXauqv6iu5+43jm2q6r7Jrm8u6+rqv2TnJHk3plNwvYH3f2ldc63fab0a7r77VX1hCT3T3J5knO6++vrmW+7qjoqyf+V2avIbkrysSSvXu9/f0yrqr67uz+z3jn4zub7EGD9eSY3SVX9ZpKfTXJ+Zu/oTWbv5j0lyfndfdZ6ZVtEVT25u1+xAXL8pyRPz6zwHJfkmd39xvm+93f3vdc5346vsKokD07yziTp7hPXPNQOquqyJD8yn538nCRfTfL6JA+db//pdc73l5n9cux2Sb6Y5A5J/iqzfNXdp+7m9DUx/z78qSQXJnlUkkuSfCGz0vur3f3u9UvHaqmqO+24Kcn7ktwrs+/Fz699qiVhZu9+f2pm33ffn2/+8vSNSV6+EX4hVFX7JPnFzP6+e0t3/79L9j23u39v3cLNMtw1yXMz+/d2VpL/nuR+mf0d86zu/sT6pZvZ6N+Ht1RVdefu/tx65wCWZz4Y8vX5K2FTVQ/OfLCmu/9uTbMouUlV/X9J7rHjDx3z/1CXdffR65NsMVX1qe4+fAPkuDTJ/br7K1V1RGbl7JXd/SdV9YHuvtc653t/ZiOif57ZD5yV5NX55qurLly/dDNVdXl3332+/C2/GKiqS7r7uPVLl1TVh7r7nvMf4P81yfd39zeqqpJ8sLvvuZ75kv/zfXjcPNftkmzp7h+vqsOTvHG9vw9vyTbSD55VdXOST+6w+dDMflHZ3X3XtU/1TVX16sx+EXRevvWXp6cmuVN3P369sm1XVX+e2S+sLkryC0ku7O7T5/s2wi8m/1dm/48+MMnPJ3lFktcm+YkkP9fdD1nHeEk2/vdh8q13UlXVgUlenOQ+ST6c5D9397+vc76zkvy37v5sVW3O7L/xzUluneSJG+Hv5u2q6tY7+Vnx4O7+7HplWmr+99x13f3F+c9hm5N8tLs/vK7B9qCq7rD07j92b/4z1/H51rtPL9peLNdTVX0wyY939xeq6lmZ/aJ3S5IHZfYK2WevVRbP5M7cnNlv2nf0ffN9666qPrSLz6VJvme9883ts/1/UvPfsP94kkdW1YszK5TrbXNmv2H/rSRfmo/oXd/dF26gv0Q/XFVPni9/cP4XfqrqB5Os+8hPklvNf/lzx8x+OD5wvv22mf1AslFsfxTjtpllTXd/KhskY1UdUFX/tapeOb/le+m+l6xXrqWq6qyqOni+vLmqrkry3qr6ZFU9aJ3jJcl/SXJFkhO7+8juPjLJtvnyuheLJPfu7l/p7vd097b55z3d/SuZjfJtBMd39xO6+4+T3DfJHarqr6rqttkY/8++Y3f/z/ndVAd094u6++rufnmSg9Y73NxG/z5Mkj9YsvyiJP+W2d0uFyd52bok+laPXlIS/yjJ47v7B5I8PLO8666qHlxV25JcU1V/Py+Q2/39+qT6VlV1RmZ3ML2nqn4xyVuSPDLJa6rq9HUNt2cfWe8ASVJVx1bVe+r/b+/ugzYdyziOf38sihVG2S1CdjE2GRPhjzKqLQyV8rZMXqbQG5P+SERjtxmFMW2GjYmEGeWdKYwiNuRtKWKzmC1jabxs0ZZEu379cV6359p7n2dtrPs6n8fvM/PM3q7ruc3B3i/XcV7HeRzSAkk/lrRe69zdXcbWI+mTlC1Y0ynVansAM4BHm3NdW9X2c83j/YGPN1VBu1NiHZjsyS2OBn4j6VFgQXNsY2AycGRnUS1tArArpeyyTcDtgw9nWE9J2tb2fQDNHd09gfOAD3QbGth+BZgp6bLmz6ep7z1wGHC6pBOAhcAdkhZQXpeHdRpZ8RNgHrAqZbHgsib52YlS7l+Dc4E5ku4EdgZOAZD0LqCW0sGfUr6krgC+IGlv4EDbL1H+X9ZgD9vHNo97F55zmgWXn1EWjTpj+zRJF1PeywuAEykr2rV4TtK+wBXNZw+SVqHMhu//HO/K6r0HthcDR0g6kbKFY3xnUQ15pXm9rQOsKWl72/dImkz5DOrcKHgd9tu+VRE0U1LnW0yA1SSNa16Db7c9B8D2I82CSw1OBXa1PVfSPsANkg6yfSd1LAhBqcaYQlmAfgzYzPazktYC7qLcwe/MchJtUcfnDcBZlOTxTso1122SPm17PpUskgOnA1P7t2tIeh/ljulWXQTVskjS1k31wELgbcCLlOvtgd5cre0CvxO2r2++SHu3/kUpNZpje0mnwQ25BhjfSyDbJM0efDjDOpjS5OdVzZfWwZJqWC0GwPYTwL6S9gAWdR1PW9MY6VBJawObUd6jT3RdTtZje6akS5rHf5V0ITAVOMd2FaucTXn8jZQP+h/Yntccf5aS9NZgku29m8dXSzoeuElS5/vCW6q/8Gy9lz8F3EC5uKvFNMoCyyxJzzfH1gVubs7V4B71NQW0PUPSk5SLva4dA/ySUlG1F3CcpG0oSe8RXQbWVvnrEGCDJsEQ8A5JapU11lDRNwu4rilbvl7SDxnq9bDMNU9HVrc9F8D25ZIeAq5s7p7WsqixxPaLkl6mJBV/A7D9Qqlu7dz3KAumi4c5V8PrEMp1du/z8DRJ91JekwdRz9/zOIa2wLQ9SR2J+JeBi5qy5Wco3zO/BbZh6aqSN1325EZEDFhzgfT+3h2+5tghlIv68bY36Sy4oXiOopQ0nkxZHFiXoQvPzWwf1GF4wNLd3IEllMWDB/sTt66odEs3MJ+y6LITpfnGdZ0G1iJpB8re0TmSpgC7UfbwVRNjm6RrKKXBtWwlanfEX5NyF+iDlK0xnXfEB2juzrf9qLnDNxE41RVMGJC0C/AVYHPKhfoCSpO28/r3wHZB0j3Anrafah3biHIDYpLttTsLbiie8ynVGWtRmlYuppQsf4xS+r9fd9GBpNuBo2zfO8y5Bbbf20FY/XHcD+zcft82C2tXUHoprN9ZcEPxHAfsR6mea1ef7g9cavv7XcXWo9LU8JPAFgwl5b+y/fxyn7iy40iSGxExWJJOBX5t+8a+47sBZ7iSZnetC8/eF9UC4GrKhedwq/EDo/q7uZ9I2YM0jnJ3bwfKfrmplC/7kzoMDxg2xh2B2VQSo5btiA/lgr3mjvgvUC6Iq+iID68m4vNs/6NJxI+l7AuvYjQdvDr27XMMjX17hIrGvkmaCjxr+/6+4+sAR3b9XmliGUfZDmFK488dgAOBx4FZtl/oMDwkbQn8vamq6j83oYaKNZUeGX9uytDbxzcGvmP78G4iW5qkrRgae9qrPv2F7Sr2NtciSW5EREVUyUiw5akhRtXfzf0BSvK9BvAUsJGH5l/f5Yo6kVNpjJL+AMwlHfHfkGES8dpG032d0pDmFjL2LSqjUTD3WnVNPqimm3stNfAREVHM6DqAFVBDjLV3c19se4ntfwPzbS8CsP0ilXTtp/4YtyMd8VeGVVqVF9vbPtr2bbZnUHo/dO0wYHeXDqxTgSm2j6eUzs/sNLIVIGmgsz9Hosq79kuaKOksSbMkrS9puqQHJF0q6d1dxwdl7nXfz/rA3ZLW07IzsTuhpScfbKfS/PNO1TP5oJpu7mk8FRExYJL+ONIpKhkJNgpirLqbO/CypDWbBHK73sFmZbuGBBIqj9HpiL+yPNiqvrhfQ12qa0rEx1H21S819k1SDY10kDTS9gdRqiFqUHvX/vOBayl7hm8GLqLcwf8McHbzZ9cWsuzc6w2B31OqSWpYFGpPPjiNyiYf9Om0m3ttXxYREW8Fo2EkWO0x1t7Nfefm4rKXrPWsBtQwtgVGR4zpiP/G1Z6Ij4axb3Moe+qHqxJZd8CxjKT2rv0TbJ8BIOmrtk9pjp8h6YsdxtV2DKWa4Ju2HwCQ9BeX+de1qH3yQTXd3JPkRkQM3mgYCVZ1jE3iM9K53w0ylhFieGmE4wspiUbnRkOMbbavpdwJqo7tfwL3v+YvdqD2RNyjY+zbQ8CXbD/af6JZMKjBGpJW6S1Y2T5J0hOUvc41zKFtJzgX9p3L3OsVV/vIrXNoqjGAC4B3Ar1u7gONL42nIiIiIiJGIGkf4AHbDw9zbi/bV3cQVn8cVXftl/Rdysiqf/UdnwycbHufbiIbnsrc6+OBTW1P7DqetponH8DS4/3af98a8Hi/JLkREREREa9DDd3mX0vtMdYaX9Npvjd/vcoY22qIUdJRwJFUMN4vSW5ERERExOsg6XHbG3cdx/LUHmPt8UFi/D9iqGa8X/bkRkRERESMYBR0m68+xtrjg8S4kiw13q8prb5c0iYMeLxfktyIiIiIiJHV3m0e6o+x9vggMa4M1Yz3S5IbERERETGyqrvNN2qPsfb4IDGuDNWM98ue3IiIiIiIiBgzBjqUNyIiIiIiIuLNlCQ3IiIiIiIixowkuRERERWQNFHSxZLmS/qTpOskbdGc+4ak/0hap+85O0i6RdLDkuZJOlfSmpIOlXRm3+/OlrT9IP+bIiIiupAkNyIiomOSBFwFzLY9yfYU4NsMjYQ4AJgDfLb1nAnAZcC3bG8JbAVcD6w9yNgjIiJqkyQ3IiKiex8F/mv77N4B2/fZvlXSJGA8cAIl2e35GnCB7Tua37fty20/PcjAIyIiapMRQhEREd3bGrh3hHMHAD8HbgW2lLSB7Wea51ywnH/n/pI+3PrnySsl0oiIiMrlTm5ERETdpgEX234FuBLYdwWfd4ntbXs/wD1vWoQREREVSZIbERHRvbnAdv0HJW0DbA7cIOkxSsJ7wPKeExER8VaXJDciIqJ7NwFrSDq8d0DSh4DTgem2N21+3gNsKGkT4EzgEEk7tp7zeUkTBx18RERETZLkRkREdMy2KZ2TP9GMEJoLTAd2oXRdbrsKmNY0mJoGnNaMEHoI+AiwaGCBR0REVEjlezUiIiIiIiJi9Mud3IiIiIiIiBgzkuRGRERERETEmJEkNyIiIiIiIsaMJLkRERERERExZiTJjYiIiIiIiDEjSW5ERERERESMGUlyIyIiIiIiYsxIkhsRERERERFjxv8AD7E0Dgpe3xAAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1152x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# detect outliers in discrete variables\n",
"\n",
"for var in ['MYCT', 'MMIN', 'MMAX', 'CACH', 'CHMIN', 'CHMAX']:\n",
" plt.figure(figsize=(16,10))\n",
" (df.groupby(var)[var].count() / np.float(len(df))).plot.bar()\n",
" plt.ylabel('Percentage of observations per label')\n",
" plt.title(var)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the above plot, we can see that the discrete variables show values that are shared by a tiny proportion of variable values\n",
"in the dataset. For linear regression modeling, this does not cause any problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Number of labels: cardinality\n",
"\n",
"Now, I will examine the categorical variable **Vendor Name**. First I will determine whether it show high cardinality. This is a high number of labels."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# plot the categorical variable\n",
"\n",
"plt.figure(figsize=(12,8))\n",
"(df['Vendor Name'].value_counts()).plot.bar()\n",
"plt.title('Number of categories in Vendor Name variable')\n",
"plt.xlabel('Vendor Name')\n",
"plt.ylabel('Number of different categories')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the **Vendor Name** variable, contain only a few labels. So, we do not have to deal with high cardinality."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11. Linear Regression Modeling\n",
"\n",
"\n",
"Now, I discuss the most important part of this project which is the Linear Regression model building. \n",
"\n",
"First of all, I will divide the dataset into categorical and numerical variables as follows:-"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Divide the dataset into categorical and numerical variables\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"df_cat = df.iloc[:,:2]\n",
"\n",
"df_num = df.iloc[:, 2:]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" <th>PRP</th>\n",
" <th>ERP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>125</td>\n",
" <td>256</td>\n",
" <td>6000</td>\n",
" <td>256</td>\n",
" <td>16</td>\n",
" <td>128</td>\n",
" <td>198</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>269</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>220</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>32000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>172</td>\n",
" <td>253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>29</td>\n",
" <td>8000</td>\n",
" <td>16000</td>\n",
" <td>32</td>\n",
" <td>8</td>\n",
" <td>16</td>\n",
" <td>132</td>\n",
" <td>132</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP\n",
"0 125 256 6000 256 16 128 198 199\n",
"1 29 8000 32000 32 8 32 269 253\n",
"2 29 8000 32000 32 8 32 220 253\n",
"3 29 8000 32000 32 8 32 172 253\n",
"4 29 8000 16000 32 8 16 132 132"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_num.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Select the predictor and target variables\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"X = df_num.iloc[:,0:6]\n",
"\n",
"y = df_num.iloc[:,-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create separate train and test sets"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.3,random_state = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View the dimensions of X_train, X_test, y_train, y_test"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((146, 6), (146,))"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.shape, y_train.shape"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((63, 6), (63,))"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_test.shape, y_test.shape"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>800</td>\n",
" <td>256</td>\n",
" <td>8000</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>320</td>\n",
" <td>128</td>\n",
" <td>6000</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>25</td>\n",
" <td>1310</td>\n",
" <td>2620</td>\n",
" <td>131</td>\n",
" <td>12</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>800</td>\n",
" <td>256</td>\n",
" <td>8000</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>220</td>\n",
" <td>1000</td>\n",
" <td>8000</td>\n",
" <td>16</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX\n",
"61 800 256 8000 0 1 4\n",
"24 320 128 6000 0 1 12\n",
"30 25 1310 2620 131 12 24\n",
"60 800 256 8000 0 1 4\n",
"56 220 1000 8000 16 1 2"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# let's inspect the training dataframe\n",
"\n",
"X_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MYCT</th>\n",
" <th>MMIN</th>\n",
" <th>MMAX</th>\n",
" <th>CACH</th>\n",
" <th>CHMIN</th>\n",
" <th>CHMAX</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>146.0000</td>\n",
" <td>146.0000</td>\n",
" <td>146.0000</td>\n",
" <td>146.0000</td>\n",
" <td>146.0000</td>\n",
" <td>146.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>205.8082</td>\n",
" <td>2,799.9726</td>\n",
" <td>11,741.2055</td>\n",
" <td>25.5685</td>\n",
" <td>4.5479</td>\n",
" <td>19.2397</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>249.6152</td>\n",
" <td>3,865.5077</td>\n",
" <td>11,879.6456</td>\n",
" <td>41.6903</td>\n",
" <td>6.5770</td>\n",
" <td>28.8810</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>17.0000</td>\n",
" <td>64.0000</td>\n",
" <td>64.0000</td>\n",
" <td>0.0000</td>\n",
" <td>0.0000</td>\n",
" <td>0.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>50.0000</td>\n",
" <td>512.0000</td>\n",
" <td>4,000.0000</td>\n",
" <td>0.0000</td>\n",
" <td>1.0000</td>\n",
" <td>5.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>115.5000</td>\n",
" <td>2,000.0000</td>\n",
" <td>8,000.0000</td>\n",
" <td>8.0000</td>\n",
" <td>1.5000</td>\n",
" <td>8.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>240.0000</td>\n",
" <td>4,000.0000</td>\n",
" <td>16,000.0000</td>\n",
" <td>32.0000</td>\n",
" <td>6.0000</td>\n",
" <td>24.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1,500.0000</td>\n",
" <td>32,000.0000</td>\n",
" <td>64,000.0000</td>\n",
" <td>256.0000</td>\n",
" <td>52.0000</td>\n",
" <td>176.0000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MYCT MMIN MMAX CACH CHMIN CHMAX\n",
"count 146.0000 146.0000 146.0000 146.0000 146.0000 146.0000\n",
"mean 205.8082 2,799.9726 11,741.2055 25.5685 4.5479 19.2397\n",
"std 249.6152 3,865.5077 11,879.6456 41.6903 6.5770 28.8810\n",
"min 17.0000 64.0000 64.0000 0.0000 0.0000 0.0000\n",
"25% 50.0000 512.0000 4,000.0000 0.0000 1.0000 5.0000\n",
"50% 115.5000 2,000.0000 8,000.0000 8.0000 1.5000 8.0000\n",
"75% 240.0000 4,000.0000 16,000.0000 32.0000 6.0000 24.0000\n",
"max 1,500.0000 32,000.0000 64,000.0000 256.0000 52.0000 176.0000"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature Scaling\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# Feature Scaling - I use the StandardScaler from sklearn\n",
"\n",
"# import the StandardScaler class from preprocessing library\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"# instantiate an object scaler\n",
"scaler = StandardScaler()\n",
"\n",
"# fit the scaler to the training set and then transform it\n",
"X_train = scaler.fit_transform(X_train)\n",
"\n",
"# transform the test set\n",
"X_test = scaler.transform(X_test)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The scaler is now ready, we can use it in a machine learning algorithm when required."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fit the Linear Regression model"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# fit the linear regression model\n",
"\n",
"# import the LinearRegression class from linear_model library\n",
"from sklearn.linear_model import LinearRegression\n",
"\n",
"# instantiate an object lr\n",
"lr = LinearRegression()\n",
"\n",
"\n",
"# Train the model using the training sets\n",
"lr.fit(X_train, y_train)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12. Predicting the results\n",
"\n",
"\n",
"I have built the linear regression model. Now it is time to predict the results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predicting the test set results"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# Predict on the test data set\n",
"y_pred = lr.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predicting estimated relative CPU performance values"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 53.25899879, -7.30914167, 85.61134478, 333.46353054,\n",
" 88.17105392])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#print(\"Predicted ERP - estimated relative performance for the first five values\")\n",
"\n",
"lr.predict(X_test)[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 13. Model slope and intercept terms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The slope parameters(w) are also called weights or coefficients. They are stored in the **coef_** attribute.\n",
"\n",
"The offset or intercept(b) is stored in the **intercept_** attribute.\n",
"\n",
"So, the model slope is given by **lr.coef_** and model intercept term is given by **lr.intercept_**."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of coefficients: 6\n",
"Estimated coefficients: [17.70202595 59.11241774 78.35042681 16.53981449 -0.35410978 38.97256261]\n",
"Estimated intercept: 100.0\n"
]
}
],
"source": [
"print(\"Number of coefficients:\", len(lr.coef_))\n",
"\n",
"print(\"Estimated coefficients: {}\".format(lr.coef_))\n",
"\n",
"print(\"Estimated intercept: {}\".format(lr.intercept_))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I constructed a dataframe that contains features and estimated coefficients. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Estimated Coefficients</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Features</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>17.7020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>59.1124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>78.3504</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>16.5398</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.3541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>38.9726</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Estimated Coefficients\n",
"Features \n",
"0 17.7020\n",
"1 59.1124\n",
"2 78.3504\n",
"3 16.5398\n",
"4 -0.3541\n",
"5 38.9726"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset = list(zip(pd.DataFrame(X_train).columns, lr.coef_))\n",
"\n",
"pd.DataFrame(data = dataset, columns = ['Features', 'Estimated Coefficients']).set_index('Features')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 14. Evaluate model performance\n",
"\n",
"\n",
"I have built the linear regression model and use it to predict the results. Now, it is the time to evaluate the model performance. We want to understand the outcome of our model and we want to know whether the performance is acceptable or not. \n",
"For regression problems, there are several ways to evaluate the model performance. These are listed below:-\n",
"\n",
"\n",
"•\tRMSE (Root Mean Square Error)\n",
"\n",
"•\tR2 Score\n",
"\n",
"•\tOverfitting Vs Underfitting\n",
"\n",
"•\tCross validation\n",
"\n",
"•\tResidual analysis\n",
"\n",
"•\tNormality test\n",
"\n",
"I have described these measures in following sections:-\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### i.\tRMSE\n",
"\n",
"RMSE stands for **Root Mean Square Error**. RMSE is the standard deviation of the residuals. RMSE gives us the standard deviation of the unexplained variance by the model. It can be calculated by taking square root of Mean Squared Error.\n",
"\n",
"\n",
"RMSE is an absolute measure of fit. It gives us how spread the residuals are, given by the standard deviation of the residuals. The more concentrated the data is around the regression line, the lower the residuals and hence lower the standard deviation of residuals. It results in lower values of RMSE. So, lower values of RMSE indicate better fit of data. \n",
"\n",
"RMSE can be calculated as follows:-"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RMSE value : 37.99\n"
]
}
],
"source": [
"# RMSE(Root Mean Square Error)\n",
"\n",
"from sklearn.metrics import mean_squared_error\n",
"mse = mean_squared_error(y_test, y_pred)\n",
"rmse = np.sqrt(mse)\n",
"print(\"RMSE value : {:.2f}\".format(rmse))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"The RMSE value has been found to be 37.99. It means the standard deviation for our prediction is 37.99. So, sometimes we expect the predictions to be off by more than 37.99 and other times we expect less than 37.99. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ii. R2 Score\n",
"\n",
"R2 Score is another metric to evaluate performance of a regression model. It is also called **Coefficient of Determination**. It gives us an idea of goodness of fit for the linear regression models. It indicates the percentage of variance that is explained by the model. \n",
"\n",
"\n",
"**R2 Score = Explained Variation/Total Variation**\n",
"\n",
"\n",
"\n",
"Mathematically, we have\n",
"\n",
"\n",
"\n",
"$$R^2=1-\\frac{SS_{res}}{SS_{tot}}$$\n",
"\n",
"\n",
"The total sum of squares, $SS_{tot}=\\sum_i(y_i-\\bar{y})^2$\n",
"\n",
"The regression sum of squares (explained sum of squares), $SS_{reg}=\\sum_i(f_i-\\bar{y})^2$\n",
"\n",
"The sum of squares of residuals (residual sum of squares), $SS_{res}=\\sum_i(y_i-f_i)^2 = \\sum_ie^2_i$\n",
"\n",
"\n",
"\n",
"\n",
"In general, the higher the R2 Score value, the better the model fits the data. Usually, its value ranges from 0 to 1. So, we want its value to be as close to 1. Its value can become negative if our model is wrong.\n",
"\n",
"\n",
"\n",
"R2 score value can be found as follows:-\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R2 Score value: 0.92\n"
]
}
],
"source": [
"# R2 Score\n",
"\n",
"from sklearn.metrics import r2_score\n",
"print(\"R2 Score value: {:.2f}\".format(r2_score(y_test, y_pred)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"\n",
"In business decisions, the benchmark for the R2 score value is 0.7. It means if R2 score value >= 0.7, then the model is good enough to deploy on unseen data whereas if R2 score value < 0.7, then the model is not good enough to deploy. \n",
"\n",
"Our R2 score value has been found to be 0.92. It means that this model explains 92% of the variance in our dependent variable. So, the R2 score value confirms that the model is good enough to deploy because it provides good fit to the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### iii. Overfitting Vs Underfitting"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training set score: 0.91\n"
]
}
],
"source": [
"# Evaluating training set performance\n",
"\n",
"print(\"Training set score: {:.2f}\".format(lr.score(X_train, y_train)))"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test set score: 0.92\n"
]
}
],
"source": [
"# Evaluating test set performance\n",
"\n",
"print(\"Test set score: {:.2f}\".format(lr.score(X_test, y_test)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation \n",
"\n",
"Training set and test set performances are comparable. An R Square value of 0.92 is very good."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### iv. Cross validation\n",
"\n",
"\n",
"Cross-validation is a vital step in evaluating a model. It maximizes the amount of data that is used to train the model. \n",
"\n",
"In cross-validation, we split the training data into several subgroups. Then we use each of them in turn to evaluate the model \n",
"fitted on the remaining portion of the data.\n",
"\n",
"It helps us to obtain reliable estimates of the model's generalization performance. So, it helps us to understand how well\n",
"the model performs on unseen data.\n",
"\n",
"We can perform cross validation as follows:-"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"# import the library\n",
"from sklearn.model_selection import cross_val_score\n",
"\n",
"# Compute 5-fold cross-validation scores: cv_scores\n",
"cv_scores = cross_val_score(lr, X, y, cv=5)\n"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0.8484 -0.864 0.7149 0.8755 0.7707]\n"
]
}
],
"source": [
"# print the 5-fold cross-validation scores\n",
"print(cv_scores.round(4))"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Average 5-Fold CV Score: 0.4691\n"
]
}
],
"source": [
"# print the avarage 5-fold cross-validation scores\n",
"print(\"Average 5-Fold CV Score: {}\".format(np.mean(cv_scores).round(4)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Interpretation**\n",
"\n",
"There is a large fluctuation in the cross validation scores of the model. \n",
"\n",
"The average 5-fold cross validation score is very poor and hence the linear regression model is not a great fit to the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### v. Residual analysis\n",
"\n",
"\n",
"A linear regression model may not represent the data appropriately. The model may be a poor fit to the data. So, we should validate our model by defining and examining residual plots. The difference between the observed value of the dependent variable (y) and the predicted value (ŷi) is called the **residual** and is denoted by e. The scatter-plot of these residuals is called **residual plot**.\n",
"\n",
"\n",
"If the data points in a residual plot are randomly dispersed around horizontal axis and an approximate zero residual mean, a linear regression model may be appropriate for the data. Otherwise a non-linear model may be more appropriate.\n",
"\n",
"\n",
"Now, I will plot the residual errors. "
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmIAAAHiCAYAAABLDqCjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3Xt8nGWd///3lZlMDp2UHmho01BaaYvlWNoAIiyKHFoaKi5iBRWQRvGwlN181a/4db9bgXXBn7ubhaKLSIKIKPYnB6mpnFaEdRFoCgXl1ANtIaQlPdA206QzmZnr+8c100zSmRyaSe6Z5PV8POZxZ+77npkrQZk31+FzGWutAAAAMPwKvG4AAADAaEUQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQzAsDHGvGaM+XiGax83xjRn6XP+aIz5UjbeK9cYY6wxZqbX7QCQHQQxAIcwxmwxxnQYY0LGmO3GmJ8ZY4KDfV9r7QnW2j9moYnogzHmi8aYP3ndDgC9I4gByGSxtTYoaa6kUyV9x+P2DDtjjL8/5wb6HgCQRBAD0Ctr7XZJj8sFMkmSMabIGPOvxph3jDHvG2PuNMaUJK4daYz5nTFmjzFmtzHmv40xBYlrW4wx5yd+Lkn0tH1gjHld0mmpn9tzCC5x7z8nfh6f+Iwdidf/zhhT2Z/fxxhTYIy5wRizyRizyxiz0hgzIXFteuJza4wx70j6Q7pziXs/mRhq3ZMYCp2T8hlbjDHfNsa8Kmm/McafeP6eMabNGPOWMea8DO37WeLv+WTi3meMMcdkuPcIY8zPE3+HrcaYf0z8fnMk3SnpzESv5p7+/G0ADD+CGIBeJQLORZI2ppz+gaTZcuFspqSpkv4pce0bkpolTZJ0lKT/IyndXmrLJR2beCyQdPUAmlUg6R5Jx0iaJqlD0h39fO31kj4l6WOSKiR9IOlHPe75mKQ5iXYdcs4YM1vSryT9g9zvuVrSKmNMIOX+KyRVSxon9zteJ+k0a21Z4n239NLGz0u6WdKRktZJuj/DfSskHSHpQ4n2XSXpGmvtG5K+KunP1tqgtXZcL58FwEMEMQCZPGKMaZP0rqRWueAkY4yR9GVJtdba3dbaNkn/IunyxOs6JU2RdIy1ttNa+982/aa2SyR9P/Ee70q6vb8Ns9bustY+aK1tT3z+9+WCSH98RdJ3rbXN1tqwpO9JuqzHEOL3rLX7rbUdGc59VlKjtfZJa22npH+VVCLpoyn3326tfTdxf0xSkaTjjTGF1tot1tpNvbSx0Vr7bKJ935Xr2To69QZjjC/Rju9Ya9ustVsk/ZukK/v5dwCQAwhiADL5VKL35uOSPizXOyO5HqBSSWsTw3J7JD2WOC9JP5TrPXvCGPO2MeaGDO9fIRfykrb2t2HGmFJjzE8Sw3H7JD0raVwinPTlGEkPp7T9DbmgdFTKPe+meV3quYrU9lpr44nrU9Pdb63dKNd79j1JrcaYB4wxFb20MfW1IUm7E5+Z6khJAXX/u23t0QYAOY4gBqBX1tpnJP1MrtdHknbKDQWeYK0dl3gckZjYr0TvzDestR+StFjS/8owH2qbpNRenmk9rrfLBb6kySk/f0PScZLOsNaOlXRO4rzpx6/0rqSLUto+zlpbbK19L/XXTvO61HMtcoHOfajrJTxaUsb3sNb+0lp7duJ1Vm54N5ODf5fEatUJic9MtVOu9zF1/ti0lDak+x0A5BiCGID++A9JFxhj5iZ6f34qqc4YUy5JxpipxpgFiZ8vNsbMTISTfXK9TbE077lS0ncSE+8rJS3rcX2dpM8ZY3zGmIXqPvRYJhcG9yQm2i8fwO9yp6TvJyfAG2MmGWMuGcDrk22vNsacZ4wplAuGYUnPpbvZGHOcMeYTxpgiSQcSbU/3N0laZIw5OzHn7GZJLySGbw+y1sYS7fi+MaYs8fv8L0m/SNzyvqTKHvPWAOQYghiAPllrd0j6uaT/mzj1bbnhx+cTQ4NPyfVQSdKsxPOQpD9L+nGG2mE3yg2lbZb0hKT7elz/e7ketT1yk9cfSbn2H3JzsnZKel5uaLS/bpP0qNzQaVvi9WcM4PWy1r4l6Qtyk+V3Jtq52FobyfCSIkm3Ju7dLqlcbhFDJr+UC5e7Jc2X+/3TWSZpv6S3Jf0p8bqGxLU/SHpN0nZjzM5+/WIAhp1JP4cWAOAFY8zPJDVba//R67YAGHr0iAEAAHiEIAYAAOARhiYBAAA8Qo8YAACARwhiAAAAHvH3fYv3jjzySDt9+nSvmwEAANCntWvX7rTWTur7zjwJYtOnT1dTU5PXzQAAAOiTMabfW7YxNAkAAOARghgAAIBHCGIAAAAeyYs5YgAA4PB1dnaqublZBw4c8LopI0pxcbEqKytVWFh42O9BEAMAYIRrbm5WWVmZpk+fLmOM180ZEay12rVrl5qbmzVjxozDfh+GJgEAGOEOHDigiRMnEsKyyBijiRMnDrqXkSAGAMAoQAjLvmz8TQliAABgSO3atUtz587V3LlzNXnyZE2dOvXg80gk0q/3uOaaa/TWW28ddhsqKyu1Z8+ejNfj8bhuvfXWw37/w8UcMQAAcIhIRGprk8rKpEBgcO81ceJErVu3TpL0ve99T8FgUN/85je73WOtlbVWBQXp+4juueeewTWiD8kgdsMNNwzp5/REjxgAADgoHpdWrZKuv1761rfccdUqdz7bNm7cqBNPPFFf/epXNW/ePG3btk3XXnutqqqqdMIJJ+imm246eO/ZZ5+tdevWKRqNaty4cbrhhht0yimn6Mwzz1Rra+sh771jxw5dcMEFmjdvnr72ta/JWnvw2uLFizV//nydcMIJuvvuuyVJN9xwg9ra2jR37lxdddVVGe/LNoIYAAA4qLFRWrlSGj9emjbNHVeudOeHwuuvv66amhq9/PLLmjp1qm699VY1NTXplVde0ZNPPqnXX3/9kNfs3btXH/vYx/TKK6/ozDPPVENDwyH3LF++XOeee65eeuklLVy4UC0tLQev3XvvvVq7dq3WrFmjf//3f9cHH3ygW2+9VWVlZVq3bp1+/vOfZ7wv2whiAABAkhuObGyUKiuloiJ3rqjIPV+92l3PtmOPPVannXbawee/+tWvNG/ePM2bN09vvPFG2iBWUlKiiy66SJI0f/58bdmy5ZB7nn32WX3hC1+QJF1yySUqKys7eK2uru5gb1pzc7M2bdqUtm39vW8wmCMGAAAkuTlhkUhXCEsqKpLCYXd94sTsfuaYMWMO/rxhwwbddtttevHFFzVu3Dh94QtfSFseIpAyac3n8ykajaZ973SrGp966ik9++yzev7551VSUqKzzz477Wf0977BokcMAABI6pqYHw53Px8OuzCW0qk0JPbt26eysjKNHTtW27Zt0+OPP37Y73XOOefo/vvvlyStWrVKbW1tktyw5oQJE1RSUqLXXntNa9askST5/a5vKhnqMt2XbQQxYAhFItKuXUPTnQ8A2RYISNXVUnNzVxgLh93zRYsGv3qyL/PmzdPxxx+vE088UV/+8pd11llnHfZ73XjjjXrqqac0b948/fGPf9TUqVMlSdXV1Wpvb9cpp5yim266SWecccbB19TU1Ojkk0/WVVdd1et92WRSVxHkqqqqKtvU1OR1M4B+i8fdPIvGRhfCkv9yq66WMqzMBoAh88Ybb2jOnDn9ujf576/Vq7t6whYt4t9fmaT72xpj1lprq/rzeuaIAUMgueooOeE1HHbPJWnxYm/bBgC9KShw/55asCB7dcSQGdkWyDIvVh0BQLYFAm5iPiFsaBHEgCzrz6ojAAAkghiQdV6vOgIA5A+CGJBlXq86AgDkDybrA0OgutodU1cdLVnSdR4AAIkgBgwJVh0BQJddu3bpvPPOkyRt375dPp9PkyZNkiS9+OKL3Srl96ahoUGLFi3S5MmTe71v48aNuuyyy7Ru3bqM97z99tt68cUXdfnll/fztxgaDE0CQ4hVRwDyVigkrV/vjoM0ceJErVu3TuvWrdNXv/pV1dbWHnze3xAmuSC2ffv2QbdHckHsgQceyMp7DQZBDAAAdIlGpWXLpPJyaf58d1y2zJ0fAvfee69OP/10zZ07V1//+tcVj8cVjUZ15ZVX6qSTTtKJJ56o22+/Xb/+9a+1bt06ffazn9XcuXMV6VELaM2aNTr55JN15pln6s477zx4ftOmTfqbv/kbnXrqqZo/f75eeOEFSdINN9ygp59+WnPnztXtt9+e8b6hxtAkAADoUlsrNTRIHR1d5xoa3HHFiqx+1F//+lc9/PDDeu655+T3+3XttdfqgQce0LHHHqudO3fqL3/5iyRpz549GjdunFasWKE77rhDc+fOPeS9vvjFL+quu+7SWWedpdra2oPnp0yZoieffFLFxcV68803dfXVV+uFF17QrbfeqjvuuEOPPPKIJKm9vT3tfUONIAYAAJxQSKqv7x7CJKm93Z2/5RYpGMzaxz311FNas2aNqqrcbkAdHR06+uijtWDBAr311lv6+7//ey1atEgXXnhhr++zc+dOdXR0HNyb8sorr9TTTz8tSQqHw7ruuuv0yiuvyO/3a9OmTWnfo7/3ZRtBDAAAOC0tks+X/prP567Pnp21j7PWaunSpbr55psPufbqq6/q97//vW6//XY9+OCDuuuuu3p9L2NM2vP/9m//pqOPPlq/+MUv1NnZqWCGINnf+7KNOWIAAMCpqJBisfTXYjF3PYvOP/98rVy5Ujt37pTkVle+88472rFjh6y1+sxnPqMbb7xRL730kiSprKxMbWm2JznyyCNVXFysP//5z5Kk+++//+C1vXv3asqUKTLG6N5775W1Nu17ZbpvqBHEAACAEwxKNTVSaWn386Wl7nyWe4lOOukkLV++XOeff75OPvlkXXjhhXr//ff17rvv6pxzztHcuXP15S9/Wf/yL/8iSbrmmmv0pS99Ke1k/XvuuUdf+cpXdOaZZ3brzbruuut099136yMf+Yi2bt2qosT+c6eeeqpisZhOOeUU3X777RnvG2pmuBLfYFRVVdmmpiavmwEAQF564403NGfOnP7dHI26Cfv19W44MhZzIayuTvIzo6mndH9bY8xaa21Vf17PXxQAAHTx+93qyFtucXPCKiqy3hOGLgQxAABwqGAwqxPzkR5zxAAAADxCEAMAYBTIhznh+SYbf1OCGAAAI1xxcbF27dpFGMsia6127dql4uLiQb0Pc8QAABjhKisr1dzcrB07dnjdlBGluLhYlZWVg3oPghgAACNcYWGhZsyY4XUzkAZDkwAAAB4hiAEAAHiEIAYAAOARghgAAIBHCGIAAAAeIYgBAAB4hCAGAADgEYIYAACARwhiAAAAHiGIAQAAeIQgBgAA4BGCGAAAgEcIYgAAAB4hiAEAAHiEIIYRJRKRdu1yRwAAcp3f6wYA2RCPS42N7hGJSIGAVF3tHgX85wYAIEcRxDAiNDZKK1dKlZVSUZEUDrvnkrR4sbdtAwAgE/oKkPciERfEkiFMcsfKSmn1aoYpAQC5iyCGvNfW5sJWMoQlJXvG2tq8aRcAAH0hiCHvlZW5OWHhcPfz4bALY2Vl3rQLAIC+EMSQ95IT85ubu8JYOOyeL1rkrgMAkIuYrI8RobraHVev7uoJW7Kk6zwAALmIIIYRoaDArY5csMDNCUsOVwIAkMsIYhhRAgFp4kSvWwEAQP8wRwwAAMAjBDEAAACPEMQAAAA8kpUgZoxpMMa0GmP+mnJugjHmSWPMhsRxfOK8McbcbozZaIx51RgzLxttAAAAyDfZ6hH7maSFPc7dIOm/rLWzJP1X4rkkXSRpVuJxraT/zFIbAAAA8kpWgpi19llJu3ucvkTSvYmf75X0qZTzP7fO85LGGWOmZKMdAAAA+WQo54gdZa3dJkmJY3ni/FRJ76bc15w4BwAAMKp4MVnfpDlnD7nJmGuNMU3GmKYdO3YMQ7MAjAqhkLR+vTsCgMeGMoi9nxxyTBxbE+ebJR2dcl+lpJaeL7bW3mWtrbLWVk2aNGkImwlgVIhGpWXLpPJyaf58d1y2zJ0HAI8MZRB7VNLViZ+vlvTblPNXJVZPfkTS3uQQJgAMmdpaqaFB6uhwvWEdHe55ba3XLQMwimWrfMWvJP1Z0nHGmGZjTI2kWyVdYIzZIOmCxHNJWi3pbUkbJf1U0tez0QYAyCgUkurrpfb27ufb2915hikBeCQre01aa6/IcOm8NPdaSX+Xjc8FgH5paZF8vvTXfD53ffbs4W0TAIjK+gBGg4oKKRZLfy0Wc9cBwAMEMQAjXzAo1dRIpaXdz5eWuvPBoDftAjDqZWVoEgByXl2dO9bXu+HIWExaurTrPAB4wLgpW7mtqqrKNjU1ed0MACNBKOTmhFVU0BMGYEgYY9Zaa6v6cy9Dk0CeiOwO6YMX1iuymxV+gxIMuon5hDAAOYAgBuS4eCSqzRe7QqTFZ7tCpJsvXqZ4hEKkAJDvCGJAjtt6aa0qHmtQINahkmhIgViHKh5r0NZLKUR6OCIRadcudwQArzFHDMhhkd0hqbxcgVjHodd8JVJrqwITGGLrj3hcamx0j0hECgSk6mr3KOA/SQFkEXPEgBFi/4YWxUz6QqQx49P+DYds04oMGhullSul8eOladPcceVKdx4AvEIQA3LYmFkV8tn0hUh9NqYxsyhE2h+RiAtclZVSUZE7V1Tknq9ezTAlAO8QxIAcFpgQ1HsLaxT2dS9EGvaV6r2FNQxL9lNbmwtbyRCWVFQkhcPuOgB4gSAG5LhjHqpTy8KlivhK1OEPKuIrUcvCpTrmIQqR9ldZmZsTFg53Px8OuzBWVuZNuwCAIAbkuIKAXzN+t0JqbdWBP62VWls143crVBBgY4z+Sk7Mb27uCmPhsHu+aJG7DgBe4N/kQJ4ITAgqcMbsAb8uEnFDb8leodGqutodV6/u6glbsqTrPAB4gSAGjFCUa+iuoEBavFhasIBgCiB3EMSAESpZriG5UjAcds8lF0hGq0BAmjix+7nI7pD2b2jRmFkVLIAAMKxG4X8XAyMf5Rr6h+2jAHiNIAaMQJRr6B+2jwLgNYIYMAJRrqFvkd0hTX2sXkWx9m7ni2LtmvpYvdteCgCGGEEMGIEo19A3to8CkAuYrA9kWygktbRIFRVS0LuJ35Rr6N2YWRUS20cB8BhBDOhDv+twRaNSba1UXy/5fFIsJtXUSHV1kn/4/69GuYbeBSYEtXlhjSoea+g2PBn2lapl4VLNYPUkgGFAEAMSegauAdfhqq2VGhqkjo6ucw0N7rhixbD8DumkK9eQye7d0saN0syZ0oQJQ9uuXHDMQ3Xaeqk09bF6xYxPPhtj+ygAw8pYa71uQ5+qqqpsU1OT183AEPOqAnymwBWPS7/5Tfc6XM3NbnjvkDpcoZBUXt49hCWVlEitrZ4OU/YlEpGuvFJ64gnXkefzSRdeKN133+joRaOOGIBsMsastdZW9edeesTgOa8rwKcrfPrAA9LOndKcOZIxUnt79zpcCxb0CCgtLS69pOPzueuzB7490XC58kr3dxg71v1eyTpkV14p/frXXrdu6B3u9lEAMFgEMXjOywrw6QqfBgLSvn3SCy9I69e7zq6yMmn8eJeliotdz1234b6KCteVlE4s5q7nqN27XU9YMoRJ7jh2rPTkk+76aBimBAAvUL4CnvK6Any6wqfr10tbtrj8tG+fu7Z/vxt1fPVVafv2NHW4gkE3Mb+0tPv50lJ3vrdhyVCoK/F5YONG97v2HIIMBNz6g40bPWkWAIwKBDF4yusK8D0Ln8Zi0oYN0pgxbkjSGDc8WlTkeoZ6VVcnLV3q5oQFg+64dKk7n040Ki1z2+tovtteR8uWufPDaOZMN3raM/RGIm6x58yZw9ocABhVCGLwTiiksm3rVWZCnlWA71n4NBzuWjQwcaI0ebIUiIRU2b5eRZ0hHXecO5c2IPr9bnVka6u0dq07rliRuXRF6irLUMgdGxrc+WE0YYKbmL9vX1cYi0Tc8wsuYFgSAIYSQQzDL6UnKHDmfP37L8p17kPL1NnheoKGuwJ8dbVbCblnj8tOsZh03HHSpPFRfW/3Mr22o1xPfjBfb31QrtrNy1QaiPYeEINBN5msr+HI+nq3CiBVe7s7P8zDlPfd5/4OBw64nr8DB9zz++4b1mYAwKjDZH0Mvx71tnySPrG1QYWrpftOWzHsFeB7Fj595hnp4Yel7+2r1bnbG1RkO5QcOZ3/SoP8fsnvH2RdsBxbZRkIuNWRo62OGAB4jTpiGF691NuyJSXa/UaryqYEPa1dFY9Lj/0mpPM/V65A7NB2hn0l+q9ftmrRkkHUm8rzumMAgMwGUkeMoUkMr156gozPp4nhFs8LiBYUSIvmtqiwJHOP1QsPtwxuRedgVlkOsUhE2rVr6FesAgAYmsRwy5d6WxUVUjRzO9/3VRxaS2ygkqspU/emTKyy9GKXgWRh3VWrXLmOMWPckO1wFdbNJ17tAgFg5CGIYXgle4IaGrpPVC8tdSEkV4bjgkHFr6lR508aVBzvamdHQakeOmKptu4KDn5FZ3KV5S23uJ7CigrFS4Oe7TLwu9+5bLhnT9e5DRska6VPfnJoPztfeL0LBICRh391YPgNtN6WR2L/WqcnKpeqw5Rof0FQBwpK9LvypfrBUVluZ8oqy+QuA+PHS9OmuePKle6LfyhFItJ//qebmjZmjHTEEe7Y2urOM0zpePXPB8DIRRDD8BtovS2PtHX49dC5K3TJR1r1iSPWavYRrfp6dIWOmOjXUUdlv9isl7sM7Nolbd7sgkVBgdTZ6Y7jx7vzu3YN3WfnC693gQAwMuXWNx9GjH7NoUn2BOWosjK3nVHLvqDGfni2jjZumO6DD6T3389+sdn+7DIwqDlpfbDWBa69e90QXEGB6xnLg4XVw8Lrfz4ARiaCGLLqcOfQ5OrkZ5MIXwUFrsMuGh26YJK63VLql/1w7DIwcaLLxZs2uWNyn8lt26RjjyVgSN7+8wEwchHEkFXJOTTJ4Ztw2D2X3Aq8nnJ58nNbm9vOaNw4N2k9FnOLG08+WSouzn4PSPJ37/n3a252BW6HOqAeeaS0Y4cbloxGXQgdO1aaNGloPzdfeP3PB8DIRBBD1vQ1h2bBgkO/rAYa3IZTWZlr0+TJrtJ8sucjGnUrC4eiByS5m8Dq1V2fNxy7DLS1uYodEyZIb73lwlhhodvqaShCZ77y6p8PgJGLIIasGegcmsMJbsOpZw9IaenQ94D03G5puIZqvQid+cirfz4ARi5WTSJrUufQpMo0h6ZtW0jjd6xXmem+wXVqcPNa6obg77zjjsPRAxIIuNA6XF/yydDZ3OzCV2mpOw7n5uv5ZLj/+QAYuegRQ9b0ew5NNCrV1mrC3fVaHvPJ92hML51ao8cX1ile4M+pyc/dekC2hVTW1qLA9AqpoPfCs7m6+KA3DLsBwPAjiCGr+vVlXlsrNTTIHOhQceLUqS83SJIeOW9F7k1+jkYV+EatJqZuRVRT4wrQ9qh9lsuLD/rCsBsADD9j86BIUFVVlW1qavK6GRiAjD1CoZBUXi51dBzymrCvRN/+YqvOuySYW8Fl2bLMWzKtWNHt1lWrMvcIer34YFBCoYPbMOXMNlQAkKOMMWuttVX9uTdXvuowwmScQ9PS4nqV0igs9un/+4cWLV6cQyEsFHKbcqeGMMk9r6931xNGZOX1aNQF0fJyaf58d1y2zJ0HAAxarnzdYbSoqHBDe2kUxGNu/lUu6SU4yudz1xP6s2o07ySGkdXR4UJnR4d7XlvrdcsAYEQgiGFAIhG3Dc5h9+4Eg25+VWlp9/Olpe58rg179RIcFYu56wkDXTWa8wbQGwgAODwEMfRLPO7mP11/vfStb7njqlXu/IDV1bn5VSUlLniVlLjndXVZb/egDSA4ppaASIax5ByxvCwBMYDeQADA4WHVJPolqxXw/X43yf2WW/JjAngyIKaumswQHEdUCYgB9AYCAA4PqybRp0jE9YCNH3/oZsd79ki33ZaHvT2Ho58rB5PDt9IIKPo5gBWjAABnIKsm6RFDnwa6ddGIFQxKs2dnvJyXNcT6CpcD6A0EAAxcrn49IIeUmZAq29fLtnWfnD2oSeihkLR+/Yia8J0cvh0/Xpo2zR1XrnTnB2PQCyTS6W9ZiuQwcmurtHatO65YcUghWwDA4eHfpsgssRVRoL5e/9f6ZDtjappbo6cW1amj0394FfAT76k+qtQf7hZBXm0tNBQbmA9pD1tqWYqkBre7Qdohxz56AwEAh4cghsxSvqwLE6fmv9Kg9nbpwY+tOLxJ6H0EgMMNH14PCw7F8G1WF0ikSpal6Lm7QbIsxS235PbiCQAYQRiaRHoZakgVRdt13uZ63fb90MAr4PdRlyqyO6Rf/Up64IGBD+8N1bBgf2W7htiQVumnLAUA5AyCGNLr5cva+H0K7DyML+te3rNTPt301RbdfLP0+uvSli2Stf0LH7mwtVC2a4gNaZV+ylIAQM4giCG9Xr6s7eF+WffynvFITJEjK1RS4qojvPqq9Oab7lpf4SNXthaqrnZz5vbskd55xx0Pt4bYkFbpz7fdDQBgBGOOGNJLfln3qCEV9pXquQ8tVejp4MDnX2V6T3+p1p6yVIEJQbW1Sdu2ud6wP/zBXZ8xIxE+TEhaf2iphdTQ0rPO2XBuLVRQ4OZuLVgw+AUDyR62nnPEDmuBRDqUpQCAnEAQQ2aJL+XYXfWKxH3yKaZ1py7VU+fW6Z2USeMDWqnYIwDYaEz/PWOpnltUp00b3KLKWMztehQOS6+8Iu3cHtXdZbUKVKZfaZkaWqZM6fqobduyFFpS9Od3DQSyU1dtSKv05+LuBv0smAsAIwmV9dGrSET61tdCmuZvUXhihSIB9wUZDksffCBdeKH0xBOHsVIx8aUbObJC1/+foMaOdT1gRUXS3r2ublZnp+sNuumDZVq8s0Gml+ru0ah0443Sgw+6n/1+6dOflpYvz07JKy9XZXpVkmPY9LOkCQDkCyrrI2va2qQ2G1TblO41pIqKpE2b3ArHY489jPIKibpUAbkwc999LnAEg9K4ce6W2bOlkz8U0sL/qJeJ9V5q4fe/dxOL7C5uAAAgAElEQVT8L76465YtW6Tf/36QpR4ShqyURD9kq4ctZw20phkAjCBM1kevMk0ab293o0jTpw9+pWJ1tXTFFa4jZPdu6cAB6cQT3ee+3NiiznjvpRZSV02WlnY9srVqMhdWZY5YfZQ0GUk7LwBAOgQx9CpTWYYtW9x8rNSFd76OkMbvWK/Y3tCAVioWFEh/+7fSzTdLH/6w9IlPSMZIf/2r1KIKFRb0XmphqFdN5sqqzBGJmmYARjmGJtGndJPGr7hCeuwx97zYH9UZv67VxzfWK2Z88tmY3ttTo/GP1Kkg0P//iX3qU1JhobRqlfTyy27C/syTgnqprEbz1jUo0JlmjlgwqLLA0K6azJVVmSMSNc0AjHIEMfQpU1kGv9/Nk7riz7X62NsNKrYdUmLtR+UTDdr8KWncfSv6Pck8+TlVVdI3vyl96EOuU+SJWXUykua9XK+ofCr2x2RSSi0MdamHIS8lMZplKGmSGrQBYCRj1SQOWzwurfpVSAuuLHchrIcOU6K/v7xVJSXSJae16OOfq1DBWPfFGtkd0p7XWxSfXKEJ04LdwkwkIl1/vdumKLUHyra5lZbf/VGFAhO6f0EnVzWm9totWpS9VY1D/f6jGqsmAYwwA1k1SRDDoLT+ab3Kzp2vkuihk6r3FwT15nGX6OQNDykqnwIFMRVc+Xk1b2hX+Z8eUtT65VNMjx9dI1NXp4s/5T8YalatytwD1dsqxaEu9TDiS0l4iTpiAEYIghiGXLKH6ImHQvrhvel7xKLyK+4rVCCl9ETyf20m5b4OU6rVU5YqcOeKgyGrPz1QoZC0dav7zp4yhWAEAMgNAwliDKrgsCTragUnB/XHmTXqMN33LewwJTKy3UKY5AKYUXcltl3V2+r1xEOhg6UgkvPFbrtN+uEP3XHxYnc+GpX+8R+lOXOkj35UOuUU6cwzpUcecQEOAIB8QRDDgPWsq/XCZ+v0zMylOmBKFDJBdZgSvVj5aUULi/v9njHjU+GOlkNKQSSLmab2dt14o/TTn7r6n0cc4XrENm2S/umfXLvgrUjE7YxAfTUA6BszYdGnnvOietbVsj6/nv/cCq3puEV7Xm/RvIsr9PTT0kcffLDfn+GzMXVOqjhYCiLTXKxQSPrNb9zPpaWuh6ygwIWx996Tfvtbt7qzv8OUzPnKHi+3gQKAfEUQQ0aZvlgvuKB7Xa1YLFnsNahQxWx9/GKpZJL0p9dq9NE3GlQUa+/1czpMiVZPqdGFlwbl97uJ+pm+zLdtc+eN6f7lnlxct2ePC1Z9bQlEaMg+L7eBAoB8RRAbxXrrDXrnHenHP5bWr5eOO+7QL9bqaunXv3bbEW3Z4t5r505p8mQ3f6uoSFq4vE6dj0QUuP+uQ+aFSW7ifkw+PXl0jQrr6lRd3feXeXJSfijkwlQyNEWj7jhuXP8KrBIasquvbaAG0ksJAKMJ/+0/CsXjrtfp+uulb33LHVevDCm+9mV1/PllzT8upOnTpR/8QHr4YTcfq7Oz+xfrBRdIM2ZIr77qvoTb212vVCTiwtn48dKDv/XrD3O/obA/cymC0IWXauGGFfrkpX5Fo33v6RgMSpdd5q61t7vfJRp1wWzqVOmSS/r+wmfvyOxjGygAODwEsVEo2Rs0frw0vTKqq9b8nRZ8dpxM1TwVf3SeXlg/Tivs36nQRGWMm3j9k5+41ya/WD/4QNqxw21LdMEF0tix0rRprkdqwwYXyo6ZGNKLf9yvgnhn2nYYSUf88VEFIq4GWX+/zJcvl778ZTdHbO9eF8KOPVa66aau7Zh6Q2jIvkybw7MNFAD0jqHJUaZnb9CC1bWqevku+dS1359fMV2ruxS3BbpeKw6Gsd273VBgYWHXe5WWup4pa1OKoEej+sgva3XeZrclUYHtlNWhZSskyUTC0vz50iuvqKysuF97Ovr90j//s3TDDYdXR4y9I7OPbaAA4PDQIzbKpPYGBSIhzXvpbvls9JD7ChVVjeo1RiEla/4++qgb0ty6Vfrzn10gS4YXn88NEUaj0vWba/WJzW7vyaANya8+inutXy+dfvrBL/Pm5q6eleSX+YUXdrU9KRiUTjhBOuaYgX3R9/Y5ixYRGg5XdbULXXv2uDmGe/a45/3ppQSA0YoesVGmqMjNq2pvlyZ2tMgWFCilM6ybuIwq1KINmi1JGjNGOvlkF3wefliaPt1N1K+slGbNcvPFSuMhLWmrP6TSvpEy9opJkv7yF2n7dlVXT5bUVVE/EHCf89hjLgRma3VjMhykVu4nNAxOps3hAQCZEcRGidRyDc3N0nPPSafNqdC1scy9VQWyalGFJFc49dJLXc+X5MLXjh3u3BNPSMXFbo5Y+Z4WReVL+35R+eRXLHMYe+UVFSyY3O3L/JlnXOjL9upGQsPQSRbhBQD0jSA2SqSWazjrLDep/unngvpF8Zd0VfudKlT34clO+VWvGu1XUD6fmxzvS8lXRUVumPCcc1wvUjLMRHZXKFAZS9vLZo1fshm63yS3V1FCIODe74knhrYkwkgLDRSoBYD8QhAbBdKVa5gzR9q4UVoxoU5ziuI64+WfqCARkmLy67kTr9UDpXU65n037+upp6QPf1iaPdsVU02d2J4MM9Go9IMfBfWh4hot2d+gMeoq5Br2lapl4VLNeOcZNwzZ00knuSJkKfqzunEkhajBoEAtAOQngtgokC7QhMMuUKnQr0fO+5H+6/wfaMy2Ddq+TXpq6yyVTQhqxzapqspNyn/11a78NH16+tVwN94o/eIXUvlxdQpsli79oF5RFSigqN77xBc0/aE6KR6VTj+9exg76STpxRcPaTerG/uPArUAkJ8IYqNAukCTGmy2bpU2bQoqFjtVknTkMdJRR0lnnOHKUyRXTb75pvTyy+79ek5s37dP+tnP3Gds7fDr24E6xUs79Zn2n6lThar8w33aemlAxzxUp4JXX5W2b5deecUNR/boCUuiJEL/UNUeAPIXQWwUSBdoolE3AX/3bumvf3XFXSVXcqCjQ3r/fWnmTHfOGLfN0cyZ0ttvu56vKVO6f8Yvf+leO368Gwr77vZaXRq+T8UKSwpLManisQZtvVSa8bsVLnxlCGCpcmF1Y67Pu2IIFwDyF0FslEgXaJYtc9sXtbZK+/e7yfgnneTCWmOjK3FRWpq6qberoN/zSz0Skdas6SqNMcaG9IVwvUrUvYRFUaxdUx+rV2T3LQpMyLztUSovVzfmy7wrhnABIH95FsSMMQsl3SbJJ+lua+2tXrVlNEgXaNraXMg47bTuhVklqaLC1QiLx7s29T5wQPrsZ1Mq6Ce0tbnhyzlz3NSvYwozl7CIGZ8ObGhR4IzZA2q/F6sb82XeFUO4AJC/PPnvemOMT9KPJF0k6XhJVxhjjveiLSNeKOQq14fcfo7JQJMsDxEIuGHK5Jd3svdr5kz3SG7qHQi4Yq6bN7uAkir5Ph/9qOtRa45XyJ+hSqzPxjRmVsVQ/9aDlm8bg1PVHgDyk1c9YqdL2mitfVuSjDEPSLpE0usetWfkiUal2lqpvt51c8ViUk2NVFd3sEsrEJAuusid2ru366Vjx0pf/KLbxmjJopDGhloUnlihWElQ4XCPCeChkAItLfrkJyr0q1VBnX22ZP4mqCfur9EF7zao1KYpYdFzWDIUklpaXDdcsH9DlkMt3+ZdUaAWAPKUtXbYH5IukxuOTD6/UtIdme6fP3++HQ7Lly+3cjvx8ODBgwcPHjxG4GP58uVDnickNdl+ZiKvphyn2+XGdrvBmGuNMU3GmKYdO3YMU7MAAACGj1dBrFnS0SnPKyW1pN5grb3LWltlra2aNGnSsDYOAABgOBhrbd93ZftDjfFLWi/pPEnvSVoj6XPW2tfS3V9VVWWbmpqGsYV5LhSSystdQbBe2JIS/cMVrSqaGFRHh/Tss64O2JS29bp73XyNiYcOeU1nSVC+B3+jgiWXHVwA0E0wKK1d6/ZC6s369dL8+YN7j2GS63XEAAC5xRiz1lpb1Z97PekRs9ZGJV0n6XFJb0hamSmEYYCSE9+vusoVAeuFjUR1nH1T61au1wv/FVJLi7Rtm7Spo0KFJv2qx0LFVHDqKW7yfzqxmJt035eKisG/xzBJXWkKAEA2eVaW0lq72lo721p7rLX2+161Y8SIRl2F1vJy19N0773SsccqVlisTH2eJtapr9xzhn773nw9+2a5fnhgmVpboiocH9RL82oU8XcPcra0VB2fr1FkwmS3ArNn0Cstdef7s/IxGBz8ewAAkOeorD9S1NZKDQ3dhiPjGzaq6ehLVOyL6uQNDx6yQsJI8imukqgbHvzcgQZNmib97z0r9JN5dbqqQzpnfb0KAj7FIzE9O2Op7u+sk/966eKFdaq2kmlIKY+xdKmrhdFfyXvrB/EeAADkMU/miA0Uc8T60MucMCupU4XyKSpfxr6xLp3+El23pFXf/F5Qx0wMKbB1g555Rrrv+VmaNCN4SNX2xedmoQZYDtYRAwDgcOX8HDFkWUtL195EPRhJAXWqoB8hTJKi8mlK5zv6UN0yBSrLZc85R2d+8yxd9cZ3FCiISupRYT4QdJPqBxOggll4DwAA8hBBbCSoqHBzxHphpP5FsVhMizbdLt+9bpjThEIKxDp02l8aNOentXrrLcna7hXmAQDA4SGIjQTBoHTZZX0GrU4VqkNFalNQnfKps8cUwXYV65mKyzX3lZ9L7e3drpXYdn1qV73efjWk9eu7NgkvK8vy7wIAwChCEBspVqzIODyZFC/w64KZW3TeEWt1/LjtWu+b023fhyJFdG7LL2Vi6XvXYvLpQ8UtevNNt7H0okWUdAAAYDAIYiPFuHEyX/uaooGStJfbTal+5qvRe3uCisWk7+z5tmbENshIBx8+xVVkw/KrM+17+GxMWzsr1NHhQlh19ZD9NgAAjAqUrxgBDlZ+/0Gdoh1StOFuFdiYCtWpThUqJr/u839R0c64Xt85ST5FVaho2g0/pa75ZKnXI/4SvTy3RqefFVR7u/SZz0gFxHgAAAaFIJbH4nGpsdE9IhEpEPDrwkUr9IfoLZpwoEWvvzdWOzbu05ZIhb699zu6Rg0q0YF+v3/qnDNfLCIbj+uDHVFddrmfIUkAALKAOmJ5bNUqaeVKV0qiqEg6cED6n/+R3npL2rPHLaT0+6Xy0pA27J00oBAmHdorFvaXqmXBUh3z6Ap6wwAAyIA6YqNAJOJ6wpIhzFoXwtaskXbscPcY43rNxuxtkU8Z9nXsRc+hy6Jou2b8oV4F7Wk26gYAAANGEMtTbW0ujBUVueevvSa9+qrU2el6wZLnfT6po3CsCjNMwB8wn88VkAUAAIPGHLE8VVbmAtcHH7hc9Mc/uqFJa10vWDzufo5EpGLtU6cKFcgQxmxinNHE431/cCzmCsgCAIBBI4jloXhcevxxqfXtkFrXtWhrZ4UiNqgjfCGNi7SoxVZof3vXdkEtqlBMPilNELOSDny+RiVlhdI993Rtvj1zprRxY/f9K0tL3abcbEUEAEBWMDSZh1Y/GlXRN5dp5TPlemL3fG3cN0n/3XaytnRM0lrNV6vKdbuWySdXmHW/grpbX9J+lXZ7HyupZdwJ8t35Y+lHP5JaW6W1a93xpZekmhqppMQFr5ISF8Lq6jz4jZFPIhFp1y53BAD0jlWTeSYSkf5n3jKd9VaDAtGubYh6rnDcr1I1aKmu1woZI/kVVZOdp5P0l273RQMl8l9b4yrzpxMKubHPigp6wtCrQ8upuKK/1dXUnAMwugxk1SRBLM/s2hpS2bHlCsQ6+ry3XSUqV6v2K6gxCqlV5SpVmteVlLheMIIWBqFnOZVwWGpulpYskRYv9rp1ADB8KF8xQsXj0ku/a1Gn7X1PyaSofKqQW+FYoZbEPLE0WAmJQepZTkVyx8pKafVqhikBIBOCWB5pbJR+/d8V/a4J5ldMLXIrHFtUIX+m17ESEoPUs5xKUrJnrK3Nm3YBQK4jiOWJZI/D2IqgHhhTo3YV93p/u4rVYGrUboJurv2RQT1aXqOOgu4T9lVa6iblMyyJQSgrc3PCwuHu58NhF8bKyrxpFwDkOoJYnti711WTePhh6ZvtN+u3ukSZZvdZSb/0XaVV59ZpwgT3JVhYKN05u06rJy/VgYIS7S8IyrISElmSnJjf3NwVxpJzxBYtEnuTAkAGBLE88ac/Sc1bovo/25fp3ViFLlajovKps8e8rw4Vq15L9fgJ39A5px/QccdJ+/a5MGZ9fv3w6BWqOrpVP/3KWpnWVrda0k85OQxedbWbmL9nj/TOO+64ZIk7DwBIj1WTeSASka6/Xqp+bJk+sbVBY9RVtiIiv4xcAPMrqrcLZmmm3SDr88tnY/rTh2v00zl1evV1vzo7Xc/YpZdKy5eTvzA0IhE3Jyw5XAkAo81AVk3yVZwH2toksz+khc13q1AHul0LKKp2Fencgmf1dd9P9PnYfSq2B5So5apzNzfoE+dKoXtWaNs2acoUpoNhaAUC0sSJXrcCAPIDQ5N5oKwkqquf/5r8sQNpr5corP8d/4E+H71XxfH2btdMe7tUX6+gQpo1ixAGAEAuIYjlAf+3anXq5ge7VcRPZSR9Uo9kLmtBnTAAAHISQ5O5bvt2xX7yUxXFwr3eVqKwMk73o04YAAA5iR6xXBWNSsuWyR4zXf4+QliS8ful4h71xagTBgBAziKI5araWqmhQSYSzjgkeYh4XLr6ard3ZDDojtQJAwAgZ1G+IheFQlJ5udTR98be3fj90gcfuJ9bWtxwJD1hAAAMKzb9znctLbK+9Bt028QjreJiF8CCQWn2bEIYAAA5jiCWY+JxafW6CnW2R9Ne73WYkkn5AADkFYJYjmlslH61Kqj/qfh0xp6vqClUrJBJ+QAA5DuCWA6JRKRVq6Q1a6Ql2+9QTOmHJ43fp4JrmJQPAEC+I4jlkLY26fnnpc2bpUggqNd1/CG9YhF/qf4w40v63cV3Sq2t0tq17sjm3QAA5B2CWI6IRKTdu6WtWyVjpO+31+pYbew2J8xKeq/0WD3zqTqtXu3CGpPyAQDIX3SheCwed/PCGhulHTskX0dIH+7coBrdrZIeG3wbSUfve0Ol8ZC2hceprY3NlQEAyGf0iHmssVFauVKaeERUS9ct03ud5fqTzlax0m/w7VNUF666TkVFUlnZMDcWAABkFUHMQ5GIC2JTp0pnP1Src99uUIk6FFR7rxt8n7zhQS0+N6RAYDhbCwAAso0g5qG2NhfGtm8M6dxN9SpVe79eVxDwa+HJLUPcOicUkjZscEcAAJBdzBHzUFmZW+i446/vqMCmL+Cajl8xmcqhLdwajUo33ig9+KD72e+XPv1paflyFmcCAJAtfKV6KBCQPvYxyTxwuwrVmfYeq+7V9MP+UvmvWSrfEK+UvPFG6f773ZaXxcXSgQPuuSTdfPOQfjQAAKMGQ5Meisel0nhIf9v287Rzwqyk13W8OlSiA4VBhX0lalmwVL7bh7ZwayjkesKSIUxyx/Jy6aGHGKYEACBb6BHzUGOj9NxvWrTQ75PSjEx2qlBXl/7/KvzQNF14YovO+NsKLbwsOOTxeds2NxxZ3GMXpeJiV+ts2zZp1qyhbQMAAKMBQcwjoZArW9ERrpCJxdLeEzN+Xfy1afrat4IaP372sK2SnDLFzQM7cKB7GDtwQCosdNcBAMDgMTQ5zOJx6dFHpUsuccN/q54O6t7CGnWY0m73xYpL5bu2Rt/716COOkrDWqoiGHQT81tbXfiS3LG1Vbr0Ugr5AwCQLfSIDbPGRrc39/bt7nkgIH0zWicVSld11itufCqwMfmuXqrAHd5t4r18uTs+9JAbjiwslD7/+a7zAABg8Iy1PbeVzj1VVVW2qanJ62YMWiQiXXedtGaNC2Dr17tzPp/rKZtUEtIJ41tkplboZ78J5sT2RaGQmxM2ZQo9YQAA9IcxZq21tqo/99IjNoza2qT9+6WSWEjTYy3aGajQXl9Q4bAUi0l7Y0GZ42Zr6tTc2b4oGGRiPgAAQ4UgNozKSqKqWVerj77RNQT5i+Ia3TSuTuGYX0cdJY0dK1VXD++cMAAA4A2C2DAKfLtW52xqkN92uCJhkj5/oEHxXdI/jV+hqVOlyy93QQwAAIx8BLHhEgpJ9fXyhzu6nS6x7bqqs17Bm27RpVcFD6ndBQAARi7KVwyXlhY3Kz8N6/NpcryFEAYAwChDEBsuFRWy0fSFW32K6dGmCkUiw9wmAADgKYLYcAkGdeALNQr7uhdujRSW6uVTa7QvHlRbm0dtAwAAniCIDSPfbXV67sNLFfGXKBwIqtNfopfnLtWj59apqCh3SlYAAIDhwWT9YRQo9St0ywp9/f5bNOeIFoUnVqjNBtXcLC1ZQskKAABGG4LYMHOlKYJavXq2wtuloiIXwihZAQDA6EMQG2YFBdLixdKCBa7SflkZPWEAAIxWBDGPBALKib0kAQCAd5isDwAA4BGCGAAAgEcIYgAAAB4hiAEAAHiEIAYAAOARghgAAIBHCGIAAAAeIYgBAAB4hCAGAADgEYIYAACARwhiAOClUEhav94dRzv+FhiFCGIA4IVoVFq2TCovl+bPd8dly9z50Ya/BUYxNv0GAC/U1koNDVJHR9e5hgZ3XLHCmzZ5hb8FRjFjrfW6DX2qqqqyTU1NXjcDALIjFHK9PqnBI6mkRGptlYLB4W+XF/hbYAQyxqy11lb1516GJgFguLW0SD5f+ms+n7s+WvC3wChHEAOA4VZRIcVi6a/FYu76aMHfAqMcQQwAhlswKNXUSKWl3c+Xlrrzo2kojr8FRjkm6wOAF+rq3LG+3g3BxWLS0qVd50cT/hYYxZisDwBeCoXcPKiKCnp/+FtghBjIZH16xADAS8GgNHu2163IDfwtMAoxRwwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8Miggpgx5jPGmNeMMXFjTFWPa98xxmw0xrxljFmQcn5h4txGY8wNg/l8AACAfDbYHrG/SrpU0rOpJ40xx0u6XNIJkhZK+rExxmeM8Un6kaSLJB0v6YrEvQAAAKPOoLY4sta+IUnGmJ6XLpH0gLU2LGmzMWajpNMT1zZaa99OvO6BxL2vD6YdAAAA+Wio5ohNlfRuyvPmxLlM5wEAAEadPnvEjDFPSZqc5tJ3rbW/zfSyNOes0gc/m+Fzr5V0rSRNmzatr2YCAADknT6DmLX2/MN432ZJR6c8r5TUkvg50/men3uXpLskqaqqKm1YAwAAyGdDNTT5qKTLjTFFxpgZkmZJelHSGkmzjDEzjDEBuQn9jw5RGwAAAHLaoCbrG2P+VtIKSZMkNRpj1llrF1hrXzPGrJSbhB+V9HfW2ljiNddJelyST1KDtfa1Qf0GAAAAecpYm/ujflVVVbapqcnrZgAAAPTJGLPWWlvV951U1gcAAPAMQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAD6EIlIu3a5I5BNfq8bAABArorHpcZG94hEpEBAqq52jwK6MpAFBDEAADJobJRWrpQqK6WiIikcds8lafFib9uGkYE8DwBAGpGIC2LJECa5Y2WltHo1w5TIDoIYAABptLW5sJUMYUnJnrG2Nm/ahZGFIAYAQBplZW5OWDjc/Xw47MJYWZk37cLIQhADACCN5MT85uauMBYOu+eLFrnrwGAxWR8AgAyqq91x9equnrAlS7rOA4NFEAMAIIOCArc6csECNycsOVwJZAtBDACAPgQC0sSJXrcCIxFzxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCODCmLGmB8aY940xrxqjHnYGDMu5dp3jDEbjTFvGWMWpJxfmDi30Rhzw2A+HwAAIJ8NtkfsSUknWmtPlrRe0nckyRhzvKTLJZ0gaaGkHxtjfMYYn6QfSbpI0vGSrkjcCwAAMOoMKohZa5+w1kYTT5+XVJn4+RJJD1hrw9bazZI2Sjo98dhorX3bWhuR9EDiXgAAgFEnm3PElkr6feLnqZLeTbnWnDiX6TwAAMCo4+/rBmPMU5Imp7n0XWvtbxP3fFdSVNL9yZelud8qffCzGT73WknXStK0adP6aiYAAEDe6TOIWWvP7+26MeZqSRdLOs9amwxVzZKOTrmtUlJL4udM53t+7l2S7pKkqqqqtGENAAAgnw121eRCSd+W9ElrbXvKpUclXW6MKTLGzJA0S9KLktZImmWMmWGMCchN6H90MG0AAADIV332iPXhDklFkp40xkjS89bar1prXzPGrJT0utyQ5d9Za2OSZIy5TtLjknySGqy1rw2yDQAAAHnJdI0m5q6qqirb1NTkdTMAAAD6ZIxZa62t6s+9VNYHAADwCEEMAADAIwQxAAAAj/q2yUwAAAf8SURBVBDEAAA4TJGItGuXOwKHY7CrJgEAGHXicamx0T0iESkQkKqr3aOALg4MAEEMAIABamyUVq6UKiuloiIpHHbPJWnxYm/bhvxCbgcAYAAiERfEkiFMcsfKSmn1aoYpMTAEMQAABqCtzYWtZAhLSvaMtbV50y7kJ4IYAAADUFbm5oSFw93Ph8MujJWVedMu5CeCGAAAA5CcmN/c3BXGwmH3fNEidx3oLybrAwAwQNXV7rh6dVdP2JIlXeeB/iKIAQAwQAUFbnXkggVuTlhyuBIYKIIYAACHKRCQJk70uhXIZ8wRAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADAIwQxAAAAjxDEAAAAPEIQAwAA8AhBDAAAwCMEMQAAAI8QxAAAADxCEAMAAPAIQQwAAMAjBDEAAACPEMQAAAA8QhADAADwCEEMAADkvUhE2rXLHfOJ3+sGAAAAHK54XGpsdI9IRAoEpOpq9yjIg+4mghgAAMhbjY3SypVSZaVUVCSFw+65JC1e7G3b+iMPsiIAAMChIhEXxJIhTHLHykpp9er8GKYkiOH/tXO/oXrWdRzH3x+3lmmruVlRm6XSqCQoRWz9IUTD2SatB2ZG0TCjJ0EWRVhPpAc+CCIrCiH8k0VYsaREpBgm1JNG2qCsFY5Vemq5xeYaBcekbw+u3+2Oc81zs3Pu3+6z9wvGfa7f9Xvw2/f6cu9zrut3TZKkqXT48BC2RiFsZHRn7PDhPusah0FMkiRNpZUrhz1hs7PPHp+dHcLYypV91jUOg5gkSZpKo435MzNHwtjs7HC8adNw/mTnZn1JkjS1Nm8ePu+//8idsGuuOTJ+sjOISZKkqXXaacPbkRs3DnvCRo8rp4VBTJIkTb0VK2DNmt6rGJ97xCRJkjoxiEmSJHViEJMkSerEICZJktSJQUySJKkTg5gkSVInBjFJkqRODGKSJEmdGMQkSZI6MYhJkiR1YhCTJEnqxCAmSZLUiUFMkiSpE4OYJElSJ6mq3mt4Xkn2A3/pvY6T2NnAP3ov4hRivSfLek+eNZ8s6z1Zk6j3a6rqZfOZOBVBTMeX5KGqurj3Ok4V1nuyrPfkWfPJst6TdbLV20eTkiRJnRjEJEmSOjGILQ3f7L2AU4z1nizrPXnWfLKs92SdVPV2j5gkSVIn3hGTJEnqxCA2xZJcmeSPSXYnubH3epaCJOckeTDJriS/S3JDG1+dZHuSR9vnWW08Sb7WrsFvklzU928wnZIsS7IzyX3t+LwkO1q9v59kRRt/YTve3c6f23Pd0yrJqiTbkvyh9fpb7fHFk+RT7fvkkSR3JzndHl9YSe5Isi/JI3PGxu7pJFvb/EeTbJ3E2g1iUyrJMuAbwLuBC4APJLmg76qWhKeBT1fVG4ANwMdbXW8EHqiq9cAD7RiG+q9vfz4G3Dr5JS8JNwC75hx/Ebil1fsgcH0bvx44WFWvBW5p8zS+rwI/qarXA29iqL09vgiSrAU+AVxcVW8ElgHXYo8vtG8BVx41NlZPJ1kN3AS8BbgEuGkU3haTQWx6XQLsrqo9VfUU8D1gS+c1Tb2q2ltVv24/H2b4B2otQ23vatPuAt7bft4CfLsGvwRWJXnlhJc91ZKsAzYDt7XjAJcB29qUo+s9ug7bgMvbfM1TkpcA7wRuB6iqp6rqSezxxbQceFGS5cAZwF7s8QVVVT8HDhw1PG5PbwS2V9WBqjoIbOe54W7BGcSm11rg8TnHM21MC6Q9ErgQ2AG8oqr2whDWgJe3aV6HE/cV4LPAf9vxGuDJqnq6Hc+t6TP1bucPtfmav/OB/cCd7XHwbUnOxB5fFFX1V+BLwGMMAewQ8DD2+CSM29Ndet0gNr2O9RuSr8AukCQvBn4IfLKq/nm8qccY8zrMU5KrgH1V9fDc4WNMrXmc0/wsBy4Cbq2qC4F/ceSRzbFY8xPQHm1tAc4DXgWcyfBo7Gj2+OT8vxp3qb1BbHrNAOfMOV4H/K3TWpaUJC9gCGHfrap72vATo8cx7XNfG/c6nJi3A+9J8meGx+uXMdwhW9Ue48Cza/pMvdv5l/LcxxE6vhlgpqp2tONtDMHMHl8c7wL+VFX7q+o/wD3A27DHJ2Hcnu7S6wax6fUrYH1782YFw+bPezuvaeq1vRi3A7uq6stzTt0LjN6g2Qr8eM74h9tbOBuAQ6Nb4Xp+VfW5qlpXVecy9PDPquqDwIPA1W3a0fUeXYer23zvFoyhqv4OPJ7kdW3ocuD32OOL5TFgQ5Iz2vfLqN72+OIbt6d/ClyR5Kx2J/OKNrao/A9dp1iSTQx3D5YBd1TVzZ2XNPWSvAP4BfBbjuxZ+jzDPrEfAK9m+GJ9X1UdaF+sX2fY0Plv4LqqemjiC18CklwKfKaqrkpyPsMdstXATuBDVTWb5HTgOwx79w4A11bVnl5rnlZJ3szwcsQKYA9wHcMv5vb4IkjyBeD9DG9l7wQ+yrD3yB5fIEnuBi4FzgaeYHj78UeM2dNJPsLwnQ9wc1XduehrN4hJkiT14aNJSZKkTgxikiRJnRjEJEmSOjGISZIkdWIQkyRJ6sQgJkmS1IlBTJIkqRODmCRJUif/A+nZ10AgETqIAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 720x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Plot for residual error\n",
"\n",
"# adjust the figure size\n",
"plt.figure(figsize=(10,8))\n",
"\n",
"# plotting residual errors in training data\n",
"plt.scatter(lr.predict(X_train), lr.predict(X_train) - y_train, c = 'b', s = 40, label = 'Train data', alpha = 0.5)\n",
"\n",
"# plotting residual errors in test data\n",
"plt.scatter(lr.predict(X_test), lr.predict(X_test) - y_test, c = 'r', s = 40, label = 'Test data')\n",
"\n",
"# plotting line for zero residual error\n",
"plt.hlines(y = 0, xmin = -100, xmax = 1000, linewidth = 3)\n",
"\n",
"# plotting legend\n",
"plt.legend(loc = 'upper right')\n",
"\n",
"# plot title\n",
"plt.title(\"Residual errors plot\")\n",
"\n",
"# function to show plot\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation of residual plots\n",
"\n",
"\n",
"A regression model that has nicely fit the data will have its residuals display randomness (i.e., lack of any pattern). This comes from the **homoscedasticity** assumption of regression modeling. Typically scatter plots between residuals and predictors are used to confirm the assumption. Any pattern in the scatter-plot, results in a violation of this property and points towards a poor fitting model.\n",
"\n",
"Residual errors plot show that the data is randomly scattered around line zero. The plot does not display any pattern in the residuals. Hence, we can conclude that the Linear Regression model is a good fit to the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### vi. Normality test (Q-Q Plot)\n",
"\n",
"This is a visual or graphical test to check for normality of the data. This test helps us identify outliers and skewness. The test is performed by plotting the data verses theoretical quartiles. The same data is also plotted on a histogram to confirm normality.\n",
"\n",
"Any deviation from the straight line in normal plot or skewness/multi-modality in histogram shows that the data does not pass the normality test.\n",
"\n",
"\n",
"Now, I will plot the Q-Q Plot as follows:-\n"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# plotting the Q-Q plot\n",
"\n",
"import pylab \n",
"import scipy.stats as stats\n",
"\n",
"\n",
"for var in ['MYCT', 'MMIN', 'MMAX', 'CACH', 'CHMIN', 'CHMAX']:\n",
" \n",
" plt.figure(figsize=(15,6))\n",
"\n",
" plt.subplot(1, 2, 1)\n",
" df[var].hist()\n",
" plt.title('Distribution of '+ var)\n",
"\n",
" plt.subplot(1, 2, 2)\n",
" stats.probplot(df[var], dist=\"norm\", plot=pylab)\n",
" plt.title('Q-Q plot of '+ var)\n",
"\n",
" plt.show() \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Interpretation**\n",
"\n",
"\n",
"From the distribution plots, we can see that all the above variables are positively skewed. The Q-Q plot of all the variables confirm that the variables are not normally distributed.\n",
"\n",
"Hence, the variables do not pass the normality test."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 15. Conclusion\n",
"\n",
"\n",
"\n",
"I carry out residual analysis to check for homoscedasticity assumption. Residual errors plot show that the data is randomly scattered around line zero. The plot does not display any pattern in the residuals. Hence, we can conclude that the Linear Regression model is a good fit to the data.\n",
"\n",
"\n",
"The r-squared or the coefficient of determination is 0.4691 on an average for 5-fold cross validation. It means that the predictor is only able to explain 46.91% of the variance in the target variable. This indicates that the model is not a good fit to the data.\n",
"\n",
"\n",
"I carry out normality test to check for distribution of the variables. We can see that the variables do not follow the normal distribution. The Q-Q plots confirm the same.\n",
"\n",
"\n",
"So, we can conclude that the linear regression model is unable to model the data to generate decent results. It should be noted that the model is performing equally on both training and testing datasets. It seems like a case where we would need to model this data using methods that can model non-linear relationships. Also variables need to be transformed to satisfy the normality assumption.\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment