kshirsagarsiddharth · December 5, 2019 11:53
diff --git a/using_pipeline.ipynb b/using_pipeline.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1)It is used to chain multiple estimators into one hence automate the machine learning process\n",
    "\n",
    "2)This is often useful as there are often fixed sequences of steps in processing the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3)Some codes are meant to transform features like ,Normalize numericals or turn text into vectors or fill up missing data they are called Transformers.Other codes are ment to predict variables by fitting an algorithm such as random forest or SVM they are called estimators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So in a pipeline we first sequentally apply list of transformers(data modelling) and then final estimator(ML model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The transform steps must implement fit() and transform()\n",
    "\n",
    "estimator should implement fit() and predict() "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In short pipeline are set up with fit/transform/predict functionality so that we can fit the whole pipeline to the\n",
    "training data and transform to the test data without having yo do it individually for everything you do"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_iris\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "#scaling the input data so that all the values lies between [min,max] range\n",
    "#by default this range is [0,1]\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.pipeline import Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(120,)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#next step is to load the iris data and split it into training and test set\n",
    "#putting 80% data into training and 20% into testing\n",
    "\n",
    "iris = load_iris()\n",
    "X_train,X_test,Y_train,Y_test = train_test_split(iris.data,iris.target,test_size = 0.2,random_state = 42)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#the next step is to create a pipeline.In form of key value pair with the key being string that has a \n",
    "#name for a particular step.The value is the name of the FUNCTION or the actual method.\n",
    "pipe_lr = Pipeline([('minmax',MinMaxScaler()),('lr',LogisticRegression())])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
      "  FutureWarning)\n",
      "/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.\n",
      "  \"this warning.\", FutureWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Pipeline(memory=None,\n",
       "     steps=[('minmax', MinMaxScaler(copy=True, feature_range=(0, 1))), ('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "          intercept_scaling=1, max_iter=100, multi_class='warn',\n",
       "          n_jobs=None, penalty='l2', random_state=None, solver='warn',\n",
       "          tol=0.0001, verbose=0, warm_start=False))])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pipe_lr.fit(X_train,Y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Logistic regression pipeline test accuracy =  0.9\n"
     ]
    }
   ],
   "source": [
    "score = pipe_lr.score(X_test,Y_test)\n",
    "print(\"Logistic regression pipeline test accuracy = \",score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python",
   "language": "python",
   "name": "conda-env-python-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"1)It is used to chain multiple estimators into one hence automate the machine learning process\n",
	"\n",
	"2)This is often useful as there are often fixed sequences of steps in processing the data"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"3)Some codes are meant to transform features like ,Normalize numericals or turn text into vectors or fill up missing data they are called Transformers.Other codes are ment to predict variables by fitting an algorithm such as random forest or SVM they are called estimators."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So in a pipeline we first sequentally apply list of transformers(data modelling) and then final estimator(ML model)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The transform steps must implement fit() and transform()\n",
	"\n",
	"estimator should implement fit() and predict() "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In short pipeline are set up with fit/transform/predict functionality so that we can fit the whole pipeline to the\n",
	"training data and transform to the test data without having yo do it individually for everything you do"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"from sklearn.datasets import load_iris\n",
	"from sklearn.preprocessing import MinMaxScaler\n",
	"#scaling the input data so that all the values lies between [min,max] range\n",
	"#by default this range is [0,1]\n",
	"from sklearn.linear_model import LogisticRegression\n",
	"from sklearn.model_selection import train_test_split\n",
	"from sklearn.pipeline import Pipeline"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"(120,)"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#next step is to load the iris data and split it into training and test set\n",
	"#putting 80% data into training and 20% into testing\n",
	"\n",
	"iris = load_iris()\n",
	"X_train,X_test,Y_train,Y_test = train_test_split(iris.data,iris.target,test_size = 0.2,random_state = 42)\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [],
	"source": [
	"#the next step is to create a pipeline.In form of key value pair with the key being string that has a \n",
	"#name for a particular step.The value is the name of the FUNCTION or the actual method.\n",
	"pipe_lr = Pipeline([('minmax',MinMaxScaler()),('lr',LogisticRegression())])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
	" FutureWarning)\n",
	"/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.\n",
	" \"this warning.\", FutureWarning)\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"Pipeline(memory=None,\n",
	" steps=[('minmax', MinMaxScaler(copy=True, feature_range=(0, 1))), ('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
	" intercept_scaling=1, max_iter=100, multi_class='warn',\n",
	" n_jobs=None, penalty='l2', random_state=None, solver='warn',\n",
	" tol=0.0001, verbose=0, warm_start=False))])"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"pipe_lr.fit(X_train,Y_train)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Logistic regression pipeline test accuracy = 0.9\n"
	]
	}
	],
	"source": [
	"score = pipe_lr.score(X_test,Y_test)\n",
	"print(\"Logistic regression pipeline test accuracy = \",score)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python",
	"language": "python",
	"name": "conda-env-python-py"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}
No results found