Skip to content

Instantly share code, notes, and snippets.

@sachith-gunasekara
Last active March 9, 2023 15:51
Show Gist options
  • Save sachith-gunasekara/17ab7d1712a2885ce20dfd3fc5bbceb4 to your computer and use it in GitHub Desktop.
Save sachith-gunasekara/17ab7d1712a2885ce20dfd3fc5bbceb4 to your computer and use it in GitHub Desktop.
Introduction to Linear Regression.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNrOqxYYvVJhf99a0T5t32X",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/sachith-gunasekara/17ab7d1712a2885ce20dfd3fc5bbceb4/01-com-towardsdatascience-intro-to-lr.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Introduction to Linear Regression in Python\n",
"\n",
"## Overview\n",
"\n",
"This notebook is based on the article [\"Introduction to Linear Regression in Python\"](https://towardsdatascience.com/introduction-to-linear-regression-in-python-c12a072bedf0) by [Lorraine Li](https://medium.com/@lorrli), which provides an introduction to linear regression analysis in Python using the Scikit-learn library.\n",
"\n",
"The purpose of this notebook is to provide a step-by-step guide on implementing linear regression in Python, as well as to demonstrate some of the key concepts and techniques involved in the analysis.\n",
"\n",
"## About the Author\n",
"\n",
"[Lorraine Li](https://medium.com/@lorrli) is a data scientist and writer with several years of experience in the field.\n",
"\n",
"In addition to the article on linear regression, the author has written several other articles on data science and machine learning, and has contributed to a number of open-source projects.\n",
"\n",
"## About the Article\n",
"\n",
"The article \"Introduction to Linear Regression in Python\" provides a comprehensive overview of linear regression analysis in Python, covering topics such as:\n",
"\n",
"- Understanding linear regression\n",
"- Preparing data for analysis\n",
"- Building a linear regression model\n",
"- Evaluating model performance\n",
"\n",
"The article also includes code examples and visualizations to help readers understand the concepts and techniques involved in linear regression analysis.\n",
"\n",
"## Notebook Structure\n",
"\n",
"This notebook follows the structure of the article, with each section corresponding to a step in the linear regression analysis process. The code examples and visualizations in this notebook are based on the code provided in the article, with some additional explanations and commentary added for clarity.\n",
"\n",
"## Conclusion\n",
"\n",
"Linear regression is a fundamental technique in data science and machine learning, and is used to model the relationship between a dependent variable and one or more independent variables. By following the steps outlined in this notebook and the accompanying article, readers should be able to gain a solid understanding of linear regression analysis in Python, and be able to apply this technique to their own data analysis projects.\n"
],
"metadata": {
"id": "2khR6OUnnevD"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "h-g3LVQorjOZ"
},
"outputs": [],
"source": [
"# import required libraries\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"source": [
"# set random seed for reproducible data\n",
"\n",
"seed = 0\n",
"np.random.seed(seed)\n",
"\n",
"# Generate random data\n",
"\n",
"X = 2.5 * np.random.randn(100) + 1.5\n",
"res = 0.5 * np.random.randn(100)\n",
"\n",
"alpha, beta = 2, 0.3\n",
"y = alpha + beta * X + res"
],
"metadata": {
"id": "ruZExLDUsii-"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Create a pandas dataframe for the generated data\n",
"\n",
"df = pd.DataFrame({\n",
" 'X': X,\n",
" 'y': y\n",
"})"
],
"metadata": {
"id": "paEbsCSHt9iH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# List the top rows of the dataframe\n",
"\n",
"df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "d9xrjONfuIpB",
"outputId": "eb3b63ea-4fcc-4517-e23e-c286793ee418"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" X y\n",
"0 5.910131 4.714615\n",
"1 2.500393 2.076238\n",
"2 3.946845 2.548811\n",
"3 7.102233 4.615368\n",
"4 6.168895 3.264107"
],
"text/html": [
"\n",
" <div id=\"df-d339340a-895e-483f-be33-0290fb05c503\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>X</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.910131</td>\n",
" <td>4.714615</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.500393</td>\n",
" <td>2.076238</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.946845</td>\n",
" <td>2.548811</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>7.102233</td>\n",
" <td>4.615368</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6.168895</td>\n",
" <td>3.264107</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d339340a-895e-483f-be33-0290fb05c503')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-d339340a-895e-483f-be33-0290fb05c503 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-d339340a-895e-483f-be33-0290fb05c503');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"source": [
"##Calculating the alpha and beta values manually without using any libraries"
],
"metadata": {
"id": "LCKNp-lrvDre"
}
},
{
"cell_type": "code",
"source": [
"# Calculate the mean of X and y\n",
"\n",
"X_mean = X.mean()\n",
"y_mean = y.mean()"
],
"metadata": {
"id": "wO8tQB7PuPvS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Calculate the covariance and variance\n",
"\n",
"X_y_cov = ((df['X'] - X_mean) * (df['y'] - y_mean)).sum()\n",
"\n",
"X_var = ((df['X'] - X_mean) ** 2).sum()"
],
"metadata": {
"id": "1oqXUslsvr0N"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Calculate beta and alpha\n",
"\n",
"beta_est = X_y_cov / X_var\n",
"alpha_est = y_mean - beta * X_mean\n",
"\n",
"print(f\"alpha: {alpha_est}\")\n",
"print(f\"beta: {beta_est}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "u9mDIl7LxMvn",
"outputId": "6b8c0585-196a-4d2e-d9e0-f06617e75eaf"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"alpha: 2.0410064853739187\n",
"beta: 0.3229396867092763\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"y_pred = alpha_est + beta_est * X\n",
"\n",
"y_pred"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZuERJoSfyhI8",
"outputId": "447f6fd2-91f5-4ad8-9306-db1f0c1e2c02"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([3.9496223 , 2.84848262, 3.31559936, 4.33459938, 4.0331875 ,\n",
" 1.73641148, 3.29246916, 2.40321789, 2.44208236, 2.85691239,\n",
" 2.64170948, 3.69952259, 3.13983923, 2.62365024, 2.88376865,\n",
" 2.79480772, 3.73165958, 2.35978165, 2.77817098, 1.83586249,\n",
" 0.46426169, 3.05311448, 3.2233179 , 1.92622967, 4.35790063,\n",
" 1.35123503, 2.56235912, 2.37429328, 3.76290411, 3.71170167,\n",
" 2.6505127 , 2.83072523, 1.80866289, 0.92622154, 2.24452941,\n",
" 2.65164423, 3.51869023, 3.49615644, 2.21270801, 2.28135213,\n",
" 1.6788676 , 1.37896565, 1.14786011, 4.100373 , 2.11394873,\n",
" 2.17173707, 1.51397266, 3.15312225, 1.22243685, 2.35366032,\n",
" 1.80246179, 2.83778144, 2.11301789, 1.57223355, 2.50266312,\n",
" 2.87122942, 2.57911864, 2.76961647, 2.01329657, 2.23255722,\n",
" 1.9825056 , 2.23513105, 1.868923 , 1.13170311, 2.66866087,\n",
" 2.20103849, 1.20927666, 2.89904291, 1.79290939, 2.56735409,\n",
" 3.11404671, 2.62955027, 3.44531027, 1.52848036, 2.85024622,\n",
" 1.97253512, 1.82237862, 2.05808219, 2.27388432, 2.57076106,\n",
" 1.5847332 , 3.25269757, 2.90136822, 1.28513088, 3.72695526,\n",
" 4.05606066, 3.47710278, 2.38015384, 1.66094473, 3.37672679,\n",
" 2.19991142, 3.51235609, 2.69356666, 3.31390478, 2.81312815,\n",
" 3.09586731, 2.5338932 , 3.96723716, 2.62787839, 2.84996181])"
]
},
"metadata": {},
"execution_count": 26
}
]
},
{
"cell_type": "code",
"source": [
"# Plot regression vs actual data\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"\n",
"plt.plot(X, y_pred)\n",
"plt.plot(X, y, \"ro\")\n",
"\n",
"plt.title(\"Actual vs Predicted\")\n",
"plt.xlabel(\"X\")\n",
"plt.ylabel(\"y\")\n",
"\n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 404
},
"id": "X_f3ExgiyqN-",
"outputId": "a16f1bec-1b2e-4d44-b41b-6b48a540e68f"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"#Linear Regression with Statsmodels"
],
"metadata": {
"id": "sPBS42JzMCkm"
}
},
{
"cell_type": "code",
"source": [
"import os\n",
"from urllib.request import urlopen\n",
"\n",
"parentDir = \"/content\"\n",
"dataDir = os.path.join(parentDir, \"data\")\n",
"\n",
"if not os.path.exists(dataDir):\n",
" os.mkdir(dataDir)\n",
"\n",
"!wget https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/master/data/Advertising.csv -O data/Advertising.csv"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gk-dwkkwAk8j",
"outputId": "09723462-6767-460e-98cb-61682df6f8ea"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2023-02-22 08:08:03-- https://raw.githubusercontent.com/justmarkham/scikit-learn-videos/master/data/Advertising.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 4555 (4.4K) [text/plain]\n",
"Saving to: ‘data/Advertising.csv’\n",
"\n",
"\rdata/Advertising.cs 0%[ ] 0 --.-KB/s \rdata/Advertising.cs 100%[===================>] 4.45K --.-KB/s in 0s \n",
"\n",
"2023-02-22 08:08:03 (57.8 MB/s) - ‘data/Advertising.csv’ saved [4555/4555]\n",
"\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"data = pd.read_csv(os.path.join(dataDir, \"Advertising.csv\"))\n",
"\n",
"data.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "VZglfEUaI7Nb",
"outputId": "d4eec6fc-c65e-45e8-edcb-ed2a3ff8c319"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Unnamed: 0 TV Radio Newspaper Sales\n",
"0 1 230.1 37.8 69.2 22.1\n",
"1 2 44.5 39.3 45.1 10.4\n",
"2 3 17.2 45.9 69.3 9.3\n",
"3 4 151.5 41.3 58.5 18.5\n",
"4 5 180.8 10.8 58.4 12.9"
],
"text/html": [
"\n",
" <div id=\"df-211a1f45-0267-4ff6-8247-03b0ddb4cde0\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>TV</th>\n",
" <th>Radio</th>\n",
" <th>Newspaper</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>230.1</td>\n",
" <td>37.8</td>\n",
" <td>69.2</td>\n",
" <td>22.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>44.5</td>\n",
" <td>39.3</td>\n",
" <td>45.1</td>\n",
" <td>10.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>17.2</td>\n",
" <td>45.9</td>\n",
" <td>69.3</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>151.5</td>\n",
" <td>41.3</td>\n",
" <td>58.5</td>\n",
" <td>18.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>180.8</td>\n",
" <td>10.8</td>\n",
" <td>58.4</td>\n",
" <td>12.9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-211a1f45-0267-4ff6-8247-03b0ddb4cde0')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-211a1f45-0267-4ff6-8247-03b0ddb4cde0 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-211a1f45-0267-4ff6-8247-03b0ddb4cde0');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 29
}
]
},
{
"cell_type": "code",
"source": [
"import statsmodels.formula.api as smf\n",
"\n",
"model = smf.ols('Sales ~ TV', data=data)\n",
"model = model.fit()\n",
"\n",
"model.params"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bME4v3i2JNDk",
"outputId": "c3fab14e-266d-4983-b3e9-1f3117336faf"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Intercept 7.032594\n",
"TV 0.047537\n",
"dtype: float64"
]
},
"metadata": {},
"execution_count": 35
}
]
},
{
"cell_type": "code",
"source": [
"sales_pred = model.predict()\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"plt.plot(data['TV'], data['Sales'], 'o')\n",
"plt.plot(data['TV'], sales_pred, 'r')\n",
"plt.title(\"TV vs Sales\")\n",
"plt.xlabel(\"TV\")\n",
"plt.ylabel(\"Sales\")\n",
"\n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 404
},
"id": "xlN7acS8NOTX",
"outputId": "ea913202-dfea-4f48-f5c2-27793375c0f4"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"source": [
"new_X = 400\n",
"model.predict({\"TV\": new_X})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ThTe-D23N5bP",
"outputId": "a47239ec-82ba-4aca-ef04-b6136b3a50cf"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 26.04725\n",
"dtype: float64"
]
},
"metadata": {},
"execution_count": 38
}
]
},
{
"cell_type": "markdown",
"source": [
"#Linear Regression with Scikit-Learn"
],
"metadata": {
"id": "a93mG4uZOKx0"
}
},
{
"cell_type": "code",
"source": [
"from sklearn.linear_model import LinearRegression\n",
"\n",
"predictors = [\"TV\", \"Radio\"]\n",
"\n",
"X = data[predictors]\n",
"y = data[\"Sales\"]\n",
"\n",
"lm = LinearRegression()\n",
"model = lm.fit(X, y)"
],
"metadata": {
"id": "OAmTTDuSOPcO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(f\"alpha: {model.intercept_}\")\n",
"print(f\"beta: {model.coef_}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YOsNTPAXPHb9",
"outputId": "8d5865aa-623a-418f-d6d8-efa29c38a524"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"alpha: 2.9210999124051362\n",
"beta: [0.04575482 0.18799423]\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment