Created
November 9, 2019 01:20
-
-
Save brianspiering/f3cf96c61d2785978f336c75b63ad7c7 to your computer and use it in GitHub Desktop.
Explore if multicollinearity has an impact on machine learning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"toc": true | |
}, | |
"source": [ | |
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n", | |
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Explore-if-multicollinearity-impact-machine-learning\" data-toc-modified-id=\"Explore-if-multicollinearity-impact-machine-learning-1\">Explore if multicollinearity impact machine learning</a></span></li><li><span><a href=\"#Regression\" data-toc-modified-id=\"Regression-2\">Regression</a></span></li><li><span><a href=\"#Summary\" data-toc-modified-id=\"Summary-3\">Summary</a></span></li></ul></div>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Explore if multicollinearity impact machine learning\n", | |
"-----\n", | |
"\n", | |
"Given the goal of machine learning is prediction." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"reset -fs" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Use Iris data for simplicity\n", | |
"from sklearn.datasets import load_iris\n", | |
"\n", | |
"X, y = load_iris(return_X_y=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.model_selection import train_test_split\n", | |
"\n", | |
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Repeat each feautre\n", | |
"import numpy as np\n", | |
"\n", | |
"X_train_double = np.hstack((X_train, X_train))\n", | |
"X_test_double = np.hstack((X_test, X_test))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Regression\n", | |
"-----" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.linear_model import LogisticRegression" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/Users/brian/anaconda3/envs/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.\n", | |
" \"of iterations.\", ConvergenceWarning)\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0.9777777777777777" | |
] | |
}, | |
"execution_count": 59, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"clf_baseline = LogisticRegression(multi_class='multinomial',\n", | |
" solver='lbfgs',\n", | |
" max_iter=10)\n", | |
"clf_baseline.fit(X_train, y_train)\n", | |
"clf_baseline.score(X_test, y_test)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 60, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/Users/brian/anaconda3/envs/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.\n", | |
" \"of iterations.\", ConvergenceWarning)\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0.9777777777777777" | |
] | |
}, | |
"execution_count": 60, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"clf_multicolinearity = LogisticRegression(multi_class='multinomial',\n", | |
" solver='lbfgs',\n", | |
" max_iter=10)\n", | |
"clf_multicolinearity.fit(X_train_double, y_train)\n", | |
"clf_multicolinearity.score(X_test_double, y_test)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Summary\n", | |
"-----\n", | |
"\n", | |
"Multicollinearity has __no__ impact machine learning." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" " | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.5" | |
}, | |
"toc": { | |
"base_numbering": 1, | |
"nav_menu": {}, | |
"number_sections": false, | |
"sideBar": false, | |
"skip_h1_title": false, | |
"title_cell": "Table of Contents", | |
"title_sidebar": "Contents", | |
"toc_cell": true, | |
"toc_position": {}, | |
"toc_section_display": true, | |
"toc_window_display": false | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment