Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save brianspiering/f3cf96c61d2785978f336c75b63ad7c7 to your computer and use it in GitHub Desktop.
Save brianspiering/f3cf96c61d2785978f336c75b63ad7c7 to your computer and use it in GitHub Desktop.
Explore if multicollinearity has an impact on machine learning
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Explore-if-multicollinearity-impact-machine-learning\" data-toc-modified-id=\"Explore-if-multicollinearity-impact-machine-learning-1\">Explore if multicollinearity impact machine learning</a></span></li><li><span><a href=\"#Regression\" data-toc-modified-id=\"Regression-2\">Regression</a></span></li><li><span><a href=\"#Summary\" data-toc-modified-id=\"Summary-3\">Summary</a></span></li></ul></div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explore if multicollinearity impact machine learning\n",
"-----\n",
"\n",
"Given the goal of machine learning is prediction."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"reset -fs"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"# Use Iris data for simplicity\n",
"from sklearn.datasets import load_iris\n",
"\n",
"X, y = load_iris(return_X_y=True)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"# Repeat each feautre\n",
"import numpy as np\n",
"\n",
"X_train_double = np.hstack((X_train, X_train))\n",
"X_test_double = np.hstack((X_test, X_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Regression\n",
"-----"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/brian/anaconda3/envs/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.\n",
" \"of iterations.\", ConvergenceWarning)\n"
]
},
{
"data": {
"text/plain": [
"0.9777777777777777"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf_baseline = LogisticRegression(multi_class='multinomial',\n",
" solver='lbfgs',\n",
" max_iter=10)\n",
"clf_baseline.fit(X_train, y_train)\n",
"clf_baseline.score(X_test, y_test)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/brian/anaconda3/envs/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations.\n",
" \"of iterations.\", ConvergenceWarning)\n"
]
},
{
"data": {
"text/plain": [
"0.9777777777777777"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf_multicolinearity = LogisticRegression(multi_class='multinomial',\n",
" solver='lbfgs',\n",
" max_iter=10)\n",
"clf_multicolinearity.fit(X_train_double, y_train)\n",
"clf_multicolinearity.score(X_test_double, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Summary\n",
"-----\n",
"\n",
"Multicollinearity has __no__ impact machine learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment