Skip to content

Instantly share code, notes, and snippets.

@borgeslt
Created July 29, 2020 12:43
Show Gist options
  • Save borgeslt/1c0ccacf706782e978389993f4db58af to your computer and use it in GitHub Desktop.
Save borgeslt/1c0ccacf706782e978389993f4db58af to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "IGTI_Desafio_1.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyP9SdUa8SQ6eHGLV9M9Y7jv"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Ni_8aduuXqBH",
"colab_type": "text"
},
"source": [
"# *Bootcamp* IGTI - Analista de *Machine Learning*: Desafio Módulo 1 -- Fundamentos\n",
"\n",
"Aplicação dos conceitos de análise e modelamento de Machine Learning aprendidos no Módulo 1 do Bootcamp.\n",
"\n",
"**Objetivos:**\n",
"* *Exploratory Data Analysis* (EDA) - Análise exploratória dos dados\n",
"* Preparação dos dados\n",
"* Análise de modelos - Regressão Linear / Árvore de Decisão "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aIxxhTdYcqSP",
"colab_type": "text"
},
"source": [
"## **Introdução ao Desafio**\n",
"\n",
"Neste desafio iremos utilizar uma versáo modificada do *dataset* *\"Bike Sharing\"* disponível no [***UCI Machine Learning Repository***](https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset). Serão abordados todos os conceitos apresentados durante a disciplina **Fundamentos de *Machine Learning* (FAM)**.\n",
"\n",
"No arquivo iremos trabalhar com um conjunto de informações sobre o compartilhamento de bicicletas, com dados sobre as condições climáticas, localização das bicicletas, entre outros.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tVML3nnGfX6i",
"colab_type": "text"
},
"source": [
"#### Dicionário de Variáveis\n",
"\n",
">```instant``` - index do registro\n",
">\n",
"> ```dteday``` - data\n",
">\n",
"> ```season``` - estação do ano (1: winter, 2: spring, 3: summer, 4: fall)\n",
">\n",
"> ```yr``` - ano (0: 2011, 1: 2012)\n",
">\n",
"> ```mnth``` - mês (1 a 12)\n",
">\n",
"> ```hr``` - hora (0 a 23)\n",
">\n",
"> ```holiday``` - se o dia é um feriado ou não\n",
">\n",
"> ```weekday``` - dia de semana\n",
">\n",
"> ```workingday``` - se o dia é um fim de semana e não feriado = 1, se não for = 0\n",
">\n",
"> ```temp``` - temperatura normalizada em Celsius\n",
">\n",
">```atemp``` - sensação térmica normalizada em Celsius.\n",
">\n",
"> ```hum``` - Umidade normalizada. Os valores são divididos em 100 (máx)\n",
">\n",
"> ```windspeed``` - velocidade do vento normalizada. Os valores são divididos em 67 (máx)\n",
">\n",
"> ```casual``` - contagem de usuários casuais\n",
">\n",
"> ```registered``` - contagem de usuários registrados\n",
">\n",
"> ```cnt``` - contagem total de alugéis de bicicletas entre casuais e resgistrados"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LGV_ILiujmew",
"colab_type": "text"
},
"source": [
"## **Mãos-à-Obra**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "MhrQfD6Sg4zr",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 72
},
"outputId": "5b051374-9b42-4d1b-f6ae-efcb3a9f9dee"
},
"source": [
"# importar as bibliotecas e métodos utilizados\n",
"from sklearn import datasets #sklearn é uma das lib mais utilizadas em ML, ela contém, além dos \n",
" #datasets, várias outras funções úteis para a análise de dados\n",
" # essa lib será sua amiga durante toda sua carreira\n",
"import pandas as pd # importa a lib Pandas. Essa lib é utilizada para lidar com dataframes (TABELAS) \n",
" #de forma mais amigável. \n",
"from sklearn.model_selection import train_test_split,KFold,cross_val_score, cross_val_predict # esse método é utilizado para dividir o \n",
" # conjunto de dados em grupos de treinamento e test\n",
"from sklearn.svm import SVC #importa o algoritmo svm para ser utilizado \n",
"from sklearn import tree # importa o algoritmo arvore de decisão\n",
"from sklearn.linear_model import LogisticRegression #importa o algoritmo de regressão logística\n",
"from sklearn.metrics import mean_absolute_error #utilizada para o calculo do MAE\n",
"from sklearn.metrics import mean_squared_error #utilizada para o calculo do MSE\n",
"from sklearn.metrics import r2_score #utilizada para o calculo do R2\n",
"from sklearn import metrics #utilizada para as métricas de comparação entre os métodos\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.tree import DecisionTreeClassifier \n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn import svm"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n",
" import pandas.util.testing as tm\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "O2eQVX1RhGzg",
"colab_type": "code",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 73
},
"outputId": "34208633-eff3-4605-a43f-df544086dabe"
},
"source": [
"from google.colab import files\n",
"upload = files.upload()"
],
"execution_count": 2,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <input type=\"file\" id=\"files-7e8be7ff-a3db-45f4-b052-028748257913\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-7e8be7ff-a3db-45f4-b052-028748257913\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving comp_bikes_mod.csv to comp_bikes_mod.csv\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "29K6IyqbhUBZ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 202
},
"outputId": "56f5e193-0af8-48e7-8ed3-8a9b796d6c9f"
},
"source": [
"# importar o dataset\n",
"df = pd.read_csv('comp_bikes_mod.csv')\n",
"\n",
"# visualizar as primeiras 5 entradas do dataset\n",
"df.head()"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>instant</th>\n",
" <th>dteday</th>\n",
" <th>season</th>\n",
" <th>yr</th>\n",
" <th>mnth</th>\n",
" <th>hr</th>\n",
" <th>holiday</th>\n",
" <th>weekday</th>\n",
" <th>workingday</th>\n",
" <th>weathersit</th>\n",
" <th>temp</th>\n",
" <th>atemp</th>\n",
" <th>hum</th>\n",
" <th>windspeed</th>\n",
" <th>casual</th>\n",
" <th>registered</th>\n",
" <th>cnt</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>NaN</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>NaN</td>\n",
" <td>0.2879</td>\n",
" <td>0.81</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>13.0</td>\n",
" <td>16.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>2011-01-01</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.22</td>\n",
" <td>0.2727</td>\n",
" <td>0.80</td>\n",
" <td>0.0</td>\n",
" <td>8.0</td>\n",
" <td>32.0</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.0</td>\n",
" <td>2011-01-01</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.22</td>\n",
" <td>0.2727</td>\n",
" <td>0.80</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>27.0</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.0</td>\n",
" <td>2011-01-01</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.24</td>\n",
" <td>0.2879</td>\n",
" <td>0.75</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>10.0</td>\n",
" <td>13.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>2011-01-01</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>4.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>NaN</td>\n",
" <td>1.0</td>\n",
" <td>0.24</td>\n",
" <td>0.2879</td>\n",
" <td>0.75</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" instant dteday season yr ... windspeed casual registered cnt\n",
"0 1.0 NaN 1.0 0.0 ... 0.0 3.0 13.0 16.0\n",
"1 2.0 2011-01-01 1.0 0.0 ... 0.0 8.0 32.0 40.0\n",
"2 3.0 2011-01-01 1.0 0.0 ... 0.0 5.0 27.0 32.0\n",
"3 4.0 2011-01-01 1.0 0.0 ... 0.0 3.0 10.0 13.0\n",
"4 5.0 2011-01-01 1.0 0.0 ... 0.0 0.0 1.0 1.0\n",
"\n",
"[5 rows x 17 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "55WGAGTEhtkC",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 433
},
"outputId": "40e63252-280e-431d-b94f-b06e60f82914"
},
"source": [
"# visualizar as informações do dataset\n",
"df.info()"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 17379 entries, 0 to 17378\n",
"Data columns (total 17 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 instant 15641 non-null float64\n",
" 1 dteday 15641 non-null object \n",
" 2 season 15641 non-null float64\n",
" 3 yr 15641 non-null float64\n",
" 4 mnth 15641 non-null float64\n",
" 5 hr 15641 non-null float64\n",
" 6 holiday 15641 non-null float64\n",
" 7 weekday 15641 non-null float64\n",
" 8 workingday 15641 non-null float64\n",
" 9 weathersit 15641 non-null float64\n",
" 10 temp 15641 non-null float64\n",
" 11 atemp 15641 non-null float64\n",
" 12 hum 15641 non-null float64\n",
" 13 windspeed 15641 non-null float64\n",
" 14 casual 15641 non-null float64\n",
" 15 registered 15641 non-null float64\n",
" 16 cnt 15641 non-null float64\n",
"dtypes: float64(16), object(1)\n",
"memory usage: 2.3+ MB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D45EQVhJh7Rg",
"colab_type": "text"
},
"source": [
"#### **Quantas instâncias e atributos existem, respectivamente?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zEx6rLKoiHUm",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "678edcf0-66d6-4937-eb22-7320df4a70ce"
},
"source": [
"df.shape"
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(17379, 17)"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Bg_wIGkzjeuN",
"colab_type": "text"
},
"source": [
"#### **Quantos tipos de dados existem no dataset?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "xQi-EIpqjrxv",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "029ddc6f-e411-4c32-96b9-7aa550d9cacf"
},
"source": [
"print(\"Tipos de dados existentes: {}\".format(df.dtypes.nunique()))"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": [
"Tipos de dados existentes: 2\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2UjwhgnBiIr7",
"colab_type": "text"
},
"source": [
"#### **Qual é a proporção (em %) de valores nulos na coluna \"temp\" (temperatura ambiente normalizada)?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "BMnWSD9niYUP",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "11c0d6a6-01d4-4d30-d794-8588a946ad76"
},
"source": [
"print(\"Valore nulos:\\t{}\".format(round(((df.temp.isnull().sum()/df.shape[0]) * 100),2)) + \"%\")"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": [
"Valore nulos:\t10.0%\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0kZ2QHobl25Q",
"colab_type": "text"
},
"source": [
"#### **Eliminar as linhas que contenham valores nulos para a coluna \"dteday\"**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "2x7R4QODijRE",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "00eeeafc-a75d-4b7a-e325-51f412ddf464"
},
"source": [
"# eliminando as entradas\n",
"df_clean = df.dropna(subset=['dteday'], axis=0)\n",
"\n",
"# comparação\n",
"print(\"Antes:\\t{}\".format(df.shape))\n",
"print(\"Depois:\\t{}\".format(df_clean.shape))"
],
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": [
"Antes:\t(17379, 17)\n",
"Depois:\t(15641, 17)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aWOCqVG7kLLC",
"colab_type": "text"
},
"source": [
"#### **Qual o valor médio para os dados da coluna \"temp\" depois de retirar as linhas que continham valores nulos?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Xo0ro22Kkzye",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "c6f7db9e-1265-47dd-c547-c9fb17f33241"
},
"source": [
"print(\"média:\\t{}\".format(round(df_clean.temp.mean(),4)))"
],
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": [
"média:\t0.4969\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ONAqU2y5k6ly",
"colab_type": "text"
},
"source": [
"#### **Qual o desivio padrão para os dados da coluna \"windspeed\" após a retirada das linhas com valores nulos da coluna \"dteday\"?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6K2ig-OXlP43",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "a37cdf43-a53b-4fda-9f0f-08b59e74d633"
},
"source": [
"print(\"Desvio:\\t{}\".format(round(df_clean.windspeed.std(),4)))"
],
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": [
"Desvio:\t0.1223\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LAsOKQnfltwM",
"colab_type": "text"
},
"source": [
"#### **Após a eliminação das linhas com valores nulos na coluna \"dteday\", transforme a coluna \"season\" em valores categóricos. Quantas categorias diferentes existem?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "L0vBifN19LXq",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 243
},
"outputId": "e940debf-8df8-4650-d934-f7e9610ea8d1"
},
"source": [
"df_clean.loc[:,'season'].astype(\"category\")"
],
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1 1.0\n",
"2 1.0\n",
"3 1.0\n",
"4 1.0\n",
"5 1.0\n",
" ... \n",
"17373 1.0\n",
"17374 NaN\n",
"17375 1.0\n",
"17377 1.0\n",
"17378 NaN\n",
"Name: season, Length: 15641, dtype: category\n",
"Categories (4, float64): [1.0, 2.0, 3.0, 4.0]"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "-XJH5Yhf9128",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 433
},
"outputId": "6b73e2fe-a696-4473-ea09-63bd6e10bdf5"
},
"source": [
"# confirmando a mudança\n",
"df_clean.info()"
],
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 15641 entries, 1 to 17378\n",
"Data columns (total 17 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 instant 14060 non-null float64\n",
" 1 dteday 15641 non-null object \n",
" 2 season 14061 non-null float64\n",
" 3 yr 14076 non-null float64\n",
" 4 mnth 14062 non-null float64\n",
" 5 hr 14068 non-null float64\n",
" 6 holiday 14076 non-null float64\n",
" 7 weekday 14078 non-null float64\n",
" 8 workingday 14097 non-null float64\n",
" 9 weathersit 14078 non-null float64\n",
" 10 temp 14066 non-null float64\n",
" 11 atemp 14076 non-null float64\n",
" 12 hum 14070 non-null float64\n",
" 13 windspeed 14082 non-null float64\n",
" 14 casual 14071 non-null float64\n",
" 15 registered 14090 non-null float64\n",
" 16 cnt 14079 non-null float64\n",
"dtypes: float64(16), object(1)\n",
"memory usage: 2.1+ MB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tfiBN0w4_Gj1",
"colab_type": "text"
},
"source": [
"Agora podemos ver que o tipo da coluna \"season\" foi mudada corretamente para category"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0h4XHz3onqRa",
"colab_type": "text"
},
"source": [
"#### **Transforme a coluna \"dteday\" no tipo \"datetime\", utilizando o dataset limpo de valores nulos. Qual é a última data presente no dataset(YYYY-MM-DD)?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "tO38Ch2Bo0Lw",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "4a43bda8-7ce9-4790-e4a3-2fc82deffc63"
},
"source": [
"type(df_clean.dteday) # confirmando o tipo de dado na coluna"
],
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "NHbgkNYFraDL",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 121
},
"outputId": "f757ddce-2bcc-43be-ae3b-a717dda4cee6"
},
"source": [
"# converter os tipos de dados da coluna\n",
"#df_clean['dteday']=pd.to_datetime(df_clean.loc[:,'dteday'],format=\"%Y-%m-%d\")\n",
"\n",
"df_clean['dteday'] = pd.to_datetime(df_clean['dteday'], format=\"%Y-%m-%d\")\n",
"#pd.to_datetime(df_clean['dteday'],format=\"%Y-%m-%d\")"
],
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" after removing the cwd from sys.path.\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "jslxtECx_QmT",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 433
},
"outputId": "55acfe92-b181-4116-e9df-80cfd6c40fc4"
},
"source": [
"# confirmando a mudança\n",
"df_clean.info()"
],
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 15641 entries, 1 to 17378\n",
"Data columns (total 17 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 instant 14060 non-null float64 \n",
" 1 dteday 15641 non-null datetime64[ns]\n",
" 2 season 14061 non-null float64 \n",
" 3 yr 14076 non-null float64 \n",
" 4 mnth 14062 non-null float64 \n",
" 5 hr 14068 non-null float64 \n",
" 6 holiday 14076 non-null float64 \n",
" 7 weekday 14078 non-null float64 \n",
" 8 workingday 14097 non-null float64 \n",
" 9 weathersit 14078 non-null float64 \n",
" 10 temp 14066 non-null float64 \n",
" 11 atemp 14076 non-null float64 \n",
" 12 hum 14070 non-null float64 \n",
" 13 windspeed 14082 non-null float64 \n",
" 14 casual 14071 non-null float64 \n",
" 15 registered 14090 non-null float64 \n",
" 16 cnt 14079 non-null float64 \n",
"dtypes: datetime64[ns](1), float64(16)\n",
"memory usage: 2.1 MB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "lbwVi587kXm9",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "b6aa8a2a-3edb-4fa3-b474-153b64190d85"
},
"source": [
"print(\"Última Data:\\t{}\".format(df_clean['dteday'].iloc[-1]))"
],
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": [
"Última Data:\t2012-12-31 00:00:00\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mFG_TwV6sgre",
"colab_type": "text"
},
"source": [
"#### **Após a retirada das linhas que continham valores nulos para a coluna \"dteday\". Considerando o boxplot da variável \"windspeed\"**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "fnoTVSqOxnah",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
},
"outputId": "29a4b07a-27a0-4f74-cad2-510f518d28b8"
},
"source": [
"df_clean.boxplot(['windspeed']);"
],
"execution_count": 18,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAASnElEQVR4nO3df2xd5X3H8feXEOyMMIpGFbUEmqykW8zvYcG6UrDXQoBKEKk/RDaqdY3IXIqBZasCpGIdW6DeBC2kJGm8ZGNAYbR/QCiBBA2bQiEdQfxMPGgU2vFjWn9Q6NI1DiHf/eGT9No49jVc5/oe3i8J6TzPeXzPN9Hl4yfPfc65kZlIkhrffvUuQJJUGwa6JJWEgS5JJWGgS1JJGOiSVBL71+vChx56aM6YMaNel5f26le/+hUHHnhgvcuQhvX444//LDPfO9y5ugX6jBkz2LhxY70uL+1Vb28vbW1t9S5DGlZE/Hhv51xykaSSMNAlqSQMdEkqCQNdkkrCQJekkjDQpUJnZyfNzc20t7fT3NxMZ2dnvUuSxqRu2xaliaSzs5MVK1bQ1dVFS0sLmzdvZtGiRQAsXbq0ztVJ1XGGLgHd3d10dXWxcOFCmpubWbhwIV1dXXR3d9e7NKlqBroE9Pf309HRMaivo6OD/v7+OlUkjZ1LLhLQ1NTEggULePLJJ+nr62P27Nkcf/zxNDU11bs0qWrO0CXgtNNO49Zbb+XUU0/lrrvu4tRTT+XWW2/ltNNOq3dpUtWcoUvAyy+/zNy5c1m9ejXLly+nqamJuXPn8sMf/rDepUlVM9AloK+vjyeeeILJkyfveTjXG2+8QXNzc71Lk6rmkosEzJ49m4cffnhQ38MPP8zs2bPrVJE0dga6BCxevJj58+fT09PDzp076enpYf78+SxevLjepUlVc8lFAubNmwcM3GC0e5fLkiVL9vRLjcAZuiSVRFUz9Ig4E7gemAT8U2Z+dcj5I4CbgPcUYy7LzLU1rlUaN7fddhuLFy9m1apVvPnmm0yaNIn58+cDOEtXwxh1hh4Rk4AbgbOAFmBeRLQMGfZl4I7MPAE4D1hW60Kl8bRkyRJWrVpFe3s7+++/P+3t7axatYolS5bUuzSpatUsuZwEbMnMrZm5A7gdOHfImAR+uzg+GHildiVK46+vr49TTjllUN8pp5xCX19fnSqSxq6aJZfDgBcr2i8BJw8Z8xVgfUR0AgcCHx/uhSJiAbAAYNq0afT29o6xXGl8HHHEEXzjG9/ghBNOYNu2bfT29vLEE09wxBFH+D5Vw6jVLpd5wL9k5rUR8WHg5og4OjN3VQ7KzJXASoDW1tb0m9U1UVx99dV71tCbm5vJTJYuXcrVV1+N71M1imoC/WXg8Ir29KKv0nzgTIDMfDQimoFDgZ/UokhpvLltUWVQzRr6Y8CsiJgZEQcw8KHnmiFj/gv4GEBEzAaagZ/WslBJ0shGnaFn5s6IuAhYx8CWxNWZuSkirgI2ZuYa4K+A7oj4SwY+IP1cZuZ4Fi7VktsWVQZRr9xtbW3NjRs31uXa0lBHH300S5cupb29fc/DuXp6eujs7OTZZ5+td3nSHhHxeGa2DnfOO0Ul3LaocjDQJXzaosrBQJfwaYsqB5+2KOG2RZWDM3Sp8Mgjj7BlyxZ27drFli1beOSRR+pdkjQmztAlBmbmK1asoKuri5aWFjZv3syiRYsAWLp0aZ2rk6rjDF0Curu76erqYuHChTQ3N7Nw4UK6urro7u6ud2lS1Qx0Cejv76ejo2NQX0dHB/39/XWqSBo7A10CmpqaWLFixaC+FStW0NTUVKeKpLFzDV0CLrjggj1r5i0tLVx33XUsWrToLbN2aSLz1n+pcOyxx/LMM8/saR9zzDE8/fTTdaxIeitv/ZdGsXv/+bXXXsu9997LtddeS19fH52dnfUuTaqagS7hLheVg4Eu4S4XlYOBLuEuF5WDu1wk3OWicjDQJX5ze/8VV1xBf38/TU1NdHR0eNu/GopLLlLh+eefZ8eOHQDs2LGD559/vs4VSWNjoEvAnDlzWL9+PR0dHdx99910dHSwfv165syZU+/SpKq55CIB999/P1/4whdYtmwZvb29LFu2DOAtH5RKE5kzdAnITK655ppBfddccw31upNaejsMdAmICC6//PJBfZdffjkRUaeKpLFzyUUCTj/9dJYvXw7A2WefzYUXXsjy5cs544wz6lyZVD0fziUV5syZw/33309mEhGcfvrprFu3rt5lSYP4cC6pCg888MCeNfPM5IEHHqhzRdLYGOgSMHnyZHbu3MkhhxxCd3c3hxxyCDt37mTy5Mn1Lk2qmoEuwZ4wf/XVVznyyCN59dVX94S61CgMdKnw4IMPjtiWJjp3uUiFY489tt4lSO+IM3RpiKH70aVGYaBLQwy9Y1RqFAa6VNiwYQOZSU9PD5nJhg0b6l2SNCYGulRoa2sbsS1NdAa6xMCzXLZv386UKVPYtGkTU6ZMYfv27T7LRQ3FXS4SsGvXLvbbbz+2b9/ORRddBAyE/K5du+pcmVQ9Z+hSYehzjXx0rhqNgS7BoKWVym2LLrmokVQV6BFxZkQ8FxFbIuKyvYz5TERsjohNEfGt2pYp7RuZyRlnnOHsXA1p1ECPiEnAjcBZQAswLyJahoyZBVwOfCQzjwIuHYdapXF1yy23jNiWJrpqZugnAVsyc2tm7gBuB84dMuYC4MbM/AVAZv6ktmVK4+/8888fsS1NdNXscjkMeLGi/RJw8pAxHwKIiO8Dk4CvZOZ9Q18oIhYACwCmTZtGb2/v2yhZGj8RwaWXXkp7e/uePt+nahS12ra4PzALaAOmA9+LiGMy87XKQZm5ElgJA99Y5I0bmih2f0sRwNe//vVB/VKjqGbJ5WXg8Ir29KKv0kvAmsx8IzNfAJ5nIOAlSftINYH+GDArImZGxAHAecCaIWPuZGB2TkQcysASzNYa1imNq8rtiXPnzh22X5roRg30zNwJXASsA/qAOzJzU0RcFRHnFMPWAT+PiM1AD/ClzPz5eBUtjZfM5JJLLnGpRQ2pqn3ombk2Mz+UmR/MzCVF35WZuaY4zsxcmJktmXlMZt4+nkVL4+Hiiy8esS1NdN4pKhVuuOGGEdvSRGegSxUiguuvv961czUkA11i8PbEO++8c9h+aaIz0CWpJAx0icHbEz//+c8P2y9NdAa6VCEz+exnP+tSixqSgS4Vurq6RmxLE52BLhUWLVo0Ylua6Ax0qUJEcPPNN7t2roZkoEsM3p64evXqYfulic5Al6SSMNAlBm9PnDlz5rD90kRnoEsVMpPVq1e71KKGZKBLheOOO27EtjTRGehS4amnnhqxLU10tfpOUakUIoKZM2fywgsv1LsUacycoUsM3p5YGeaupauRGOgSg3ezTJ06ddh+aaIz0KUKmcndd9/tzFwNyUCXCu9///tHbEsTnYEuFV555ZUR29JE5y4XqUJEMHXqVLZt21bvUqQxc4YuMXg3S2WYu5auRmKgS4XMJDPp6enZcyw1EgNdkkrCQJekkvBDUZXevro5yCUa1ZszdJXe7vXwav/7wKLvjvlnDHNNBAa6JJWEgS5JJWGgS1JJGOiSVBIGuiSVhIEuSSVhoEtSSRjoklQSBroklYSBLkklUVWgR8SZEfFcRGyJiMtGGPfJiMiIaK1diZKkaowa6BExCbgROAtoAeZFRMsw4w4CLgF+UOsiJUmjq2aGfhKwJTO3ZuYO4Hbg3GHG/R3QBWyvYX2SpCpV8/jcw4AXK9ovASdXDoiIPwAOz8x7IuJLe3uhiFgALACYNm0avb29Yy5Y2hd8b6oRvePnoUfEfsB1wOdGG5uZK4GVAK2trdnW1vZOLy/V3n334HtTjaiaJZeXgcMr2tOLvt0OAo4GeiPiR8AfAmv8YFSS9q1qAv0xYFZEzIyIA4DzgDW7T2bm65l5aGbOyMwZwAbgnMzcOC4VS5KGNWqgZ+ZO4CJgHdAH3JGZmyLiqog4Z7wLlCRVp6o19MxcC6wd0nflXsa2vfOyJElj5Z2iklQSBroklYSBLkklYaBLUkkY6JJUEga6JJWEgS5JJWGgS1JJGOiSVBIGuiSVhIEuSSVhoEtSSRjoklQSBroklYSBLkklYaBLUkkY6JJUEga6JJWEgS5JJWGgS1JJGOiSVBIGuiSVhIEuSSVhoEtSSRjoklQSBroklYSBLkklYaBLUknsX+8CpLE47m/X8/qv3xj368y47J5xv8bBUybz1N+cMe7X0buHga6G8vqv3+BHX/3EuF6jt7eXtra2cb0G7JtfGnp3cclFkkrCQJekkjDQJakkDHRJKgkDXZJKwkCXpJIw0CWpJKoK9Ig4MyKei4gtEXHZMOcXRsTmiHg6Iv49Ij5Q+1IlSSMZNdAjYhJwI3AW0ALMi4iWIcOeAFoz81jgO8A/1LpQSdLIqpmhnwRsycytmbkDuB04t3JAZvZk5v8VzQ3A9NqWKUkaTTW3/h8GvFjRfgk4eYTx84F7hzsREQuABQDTpk2jt7e3uiqlCuP9vtm2bds+e2/6/4BqqabPcomI84FW4LThzmfmSmAlQGtra+6L52WoZO67Z9yfs7KvnuWyL/4senepJtBfBg6vaE8v+gaJiI8Di4HTMrO/NuVJkqpVzRr6Y8CsiJgZEQcA5wFrKgdExAnAN4FzMvMntS9TkjSaUQM9M3cCFwHrgD7gjszcFBFXRcQ5xbB/BKYC346IJyNizV5eTpI0TqpaQ8/MtcDaIX1XVhx/vMZ1SZLGyDtFJakkDHRJKgkDXZJKwu8UVUM5aPZlHHPTWx4nVHs3jf8lDpoNML7fj6p3FwNdDeV/+77ql0RLe+GSiySVhIEuSSVhoEtSSRjoklQSBroklYSBLkklYaBLUkkY6JJUEga6JJWEgS5JJWGgS1JJGOiSVBI+nEsNZ5881Oq+8b/GwVMmj/s19O5ioKuhjPeTFmHgF8a+uI5Uay65SFJJGOiSVBIGuiSVhIEuSSVhoEtSSRjoklQSBroklYSBLkklYaBLUkkY6JJUEga6JJWEgS5JJWGgS1JJGOiSVBIGuiSVhIEuSSVhoEtSSRjoklQSBroklURVgR4RZ0bEcxGxJSIuG+Z8U0T8W3H+BxExo9aFSpJGNmqgR8Qk4EbgLKAFmBcRLUOGzQd+kZlHAl8DumpdqCRpZNXM0E8CtmTm1szcAdwOnDtkzLnATcXxd4CPRUTUrkxJ0mj2r2LMYcCLFe2XgJP3NiYzd0bE68DvAD+rHBQRC4AFANOmTaO3t/ftVS2NQXt7+5h/Jt7GvzF7enrG/kNSDVUT6DWTmSuBlQCtra3Z1ta2Ly+vd6nMHNP43t5efG+qEVWz5PIycHhFe3rRN+yYiNgfOBj4eS0KlCRVp5pAfwyYFREzI+IA4DxgzZAxa4A/K44/BTyQY50WSZLekVGXXIo18YuAdcAkYHVmboqIq4CNmbkGWAXcHBFbgFcZCH1J0j5U1Rp6Zq4F1g7pu7LieDvw6dqWJkkaC+8UlaSSMNAlqSQMdEkqCQNdkkoi6rW7MCJ+Cvy4LheXRnYoQ+5yliaQD2Tme4c7UbdAlyaqiNiYma31rkMaK5dcJKkkDHRJKgkDXXqrlfUuQHo7XEOXpJJwhi5JJWGgS1JJGOgqjYhYGxHvGcP4GRHx7HjWNMK1t9Xjuiq3ffqNRdJ4ysyz612DVE/O0NUwIuJLEXFxcfy1iHigOP7jiLg1In4UEYcWM+++iOiOiE0RsT4iphRjT4yIpyLiKeCLFa99VET8R0Q8GRFPR8Ss4nX+s3jtvoj4TkT8VsXrPBgRj0fEuoh4X9H/wYi4r+h/KCJ+v+ifGRGPRsQzEfH3+/ivTu8SBroayUPAR4vjVmBqREwu+r43ZOws4MbMPAp4Dfhk0f/PQGdmHjdkfAdwfWYeX7z2S0X/7wHLMnM28EvgwuKaS4FPZeaJwGpgSTF+ZfH6JwJ/DSwr+q8HlmfmMcB/v92/AGkkBroayePAiRHx20A/8CgD4ftRBsK+0guZ+WTFz80o1tffk5m7w//mivGPAldExCIGnpXx66L/xcz8fnF8C3AKAyF/NHB/RDwJfBmYHhFTgT8Cvl30fxN4X/GzHwFuG+a6Us24hq6GkZlvRMQLwOeAR4CngXbgSKBvyPD+iuM3gSmjvPa3IuIHwCeAtRHxF8BWYOiNGgkEsCkzP1x5ovhF81oxyx/2MiPVIL1TztDVaB5iYCnje8VxB/BENV9KnpmvAa9FxClF15/uPhcRvwtszcwbgLuAY4tTR0TE7uD+E+Bh4Dngvbv7I2JyRByVmb8EXoiITxf9ERG7l3a+z2++a3fPdaVaMtDVaB5iYBnj0cz8H2A7b11uGcmfAzcWSyJR0f8Z4Nmi/2jgX4v+54AvRkQfcAgD6+A7gE8BXcWHq08ysNQCA2E9v+jfBJxb9F9SvM4zwGFj+QNL1fLWf2kvImIG8N3MPLrOpUhVcYYuSSXhDF2SSsIZuiSVhIEuSSVhoEtSSRjoklQSBroklcT/A/Iiwv08r1P9AAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YQawPmsHyrT7",
"colab_type": "text"
},
"source": [
"#### **Considere o dataset após a retirada das linhas que continham valores nulos para a coluna \"dteday\". Selecione as colunas \"season\", \"temp\", \"atemp\", \"hum\", \"windspeed\". Plot a matriz de correlação.**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "03W4mXey1jKt",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 367
},
"outputId": "94792de2-fa85-4016-8f4b-4ef96fddb551"
},
"source": [
"# criar um novo dataframe com as colunas pedidas\n",
"df_correlacao = df_clean[['season','temp','atemp','hum','windspeed','cnt']]\n",
"\n",
"matriz_corr = df_correlacao.corr() # cria a matriz de correlação\n",
"\n",
"import seaborn as sn\n",
"\n",
"#plotar a matriz\n",
"plt.figure(figsize=(6,6))\n",
"sn.heatmap(matriz_corr,annot=True,vmin=-1,vmax=1,center=0,cmap='RdBu',fmt='.2f',square=True,linecolor='white');"
],
"execution_count": 19,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x432 with 2 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LYaTnaYSC7bP",
"colab_type": "text"
},
"source": [
"#### **Preencha os valores nulos das colunas \"hum\",\"cnt\" e \"casual\" com os valores médios. Utilize as variáveis \"hum\" e \"casual\" como independentes e a \"cnt\" como dependente. Aplique uma regressão linear. Qual o valor de R2? Utilize as entradas como teste.**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "S37QP3fjDwuC",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 104
},
"outputId": "b02ef9c3-35c2-44b3-b8e4-c837db32a62d"
},
"source": [
"#substituir os valores nulos pela média\n",
"df_clean.loc[:,'hum'].fillna(df_clean.loc[:,'hum'].mean(), inplace=True)\n",
"df_clean.loc[:,'cnt'].fillna(df_clean.loc[:,'cnt'].mean(), inplace=True)\n",
"df_clean.loc[:,'casual'].fillna(df_clean.loc[:,'casual'].mean(), inplace=True)"
],
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"text": [
"/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py:6245: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" self._update_inplace(new_data)\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "JZkS6g3rIKUw",
"colab_type": "code",
"colab": {}
},
"source": [
"# realizando a análise da regressão\n",
"x1 = df_clean['hum'].values #variável independente\n",
"x2 = df_clean['casual'].values #variavel independente\n",
"y = df_clean['cnt'].values # variável dependente"
],
"execution_count": 21,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2S7jSChDJu-i",
"colab_type": "code",
"colab": {}
},
"source": [
"x1_reshaped = x1.reshape(-1,1) # coloca os dados em 2D\n",
"x2_reshaped = x2.reshape(-1,1) # coloca os dados em 2D\n",
"\n",
"x = np.column_stack((x1_reshaped,x2_reshaped)) # agrupa as variaveis preditoras"
],
"execution_count": 22,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WYl0tw-6GM-S",
"colab_type": "code",
"colab": {}
},
"source": [
"# importar o modelo de regressão linear univariada\n",
"from sklearn.linear_model import LinearRegression"
],
"execution_count": 23,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "W_6cn8EBIp6e",
"colab_type": "code",
"colab": {}
},
"source": [
"# realiza a construção do modelo de regressão\n",
"reg = LinearRegression()\n",
"regressao = reg.fit (x, y) # encontra ps coeficientes"
],
"execution_count": 24,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ktfukC6uI_9X",
"colab_type": "code",
"colab": {}
},
"source": [
"# realiza a previsão\n",
"previsao = reg.predict(x)"
],
"execution_count": 25,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "upv8JVIpKWI4",
"colab_type": "code",
"colab": {}
},
"source": [
"# análise do modelo\n",
"from sklearn.metrics import r2_score"
],
"execution_count": 26,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Isu6PtxBKb7I",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "f5b330a6-79c5-4bb0-a9c1-918f7ffd7ab5"
},
"source": [
"# parâmetros encontrados\n",
"print('Y = {}X {}'.format(reg.coef_, reg.intercept_))\n",
"\n",
"R_2 = r2_score(y, previsao) #calcula R2\n",
"\n",
"print(\"Coeficiente de Determinação (R2):\", (round(R_2,3)))"
],
"execution_count": 27,
"outputs": [
{
"output_type": "stream",
"text": [
"Y = [-99.75012328 2.21512197]X 173.29337505135362\n",
"Coeficiente de Determinação (R2): 0.406\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lSC2dkkcKur6",
"colab_type": "text"
},
"source": [
"#### **Utilize os mesmos dados da questão anterior (\"hum\" e \"casual\" como variáveis independentes e \"cnt\" como variavel dependente). Aplique a árvore de decisão como regressão. Qual é o valor aproximado de R2? Utilize as entradas como teste e valores \"default\".**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9DSgZLaZusIv",
"colab_type": "code",
"colab": {}
},
"source": [
"from sklearn.preprocessing import MinMaxScaler\n",
"scaler = MinMaxScaler()\n",
"scaled_df = scaler.fit_transform(x)"
],
"execution_count": 28,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "iRkOpQQC64EB",
"colab_type": "code",
"colab": {}
},
"source": [
"from sklearn.tree import DecisionTreeRegressor # importar a árvore de decisão como regressor"
],
"execution_count": 29,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "PDY6zrJlBtrt",
"colab_type": "code",
"colab": {}
},
"source": [
"entrada_arvore = scaled_df[:,1].reshape(-1,1) # entrada para a regressão via árvore\n",
"saida_arvore = scaled_df[:,1].reshape(-1,1) # saída para a regressão via árvore"
],
"execution_count": 30,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "bVtnNeIZFHvG",
"colab_type": "code",
"colab": {}
},
"source": [
"x_train, x_teste, y_train, y_test = train_test_split(x, y)"
],
"execution_count": 31,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "CaEwRj9ma6ZU",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 121
},
"outputId": "d1f91468-8515-4b75-8a14-a23751391938"
},
"source": [
"arvore_regressora = DecisionTreeRegressor() # define o objeto para a árvore\n",
"arvore_regressora.fit(x, y) # aplica a regressao"
],
"execution_count": 32,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, presort='deprecated',\n",
" random_state=None, splitter='best')"
]
},
"metadata": {
"tags": []
},
"execution_count": 32
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "iVBhwn4fb1r3",
"colab_type": "code",
"colab": {}
},
"source": [
"# realiza a previsao\n",
"previsao_arvore = arvore_regressora.predict(x)"
],
"execution_count": 33,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "7ALblwSSb-VR",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "2ded5eb4-f91c-4a05-f6c4-071f7b4590ac"
},
"source": [
"print(\"Erro Absoluto:\", metrics.mean_absolute_error(y, previsao_arvore))\n",
"print(\"Mean Squared Error:\", metrics.mean_squared_error(y, previsao_arvore))"
],
"execution_count": 34,
"outputs": [
{
"output_type": "stream",
"text": [
"Erro Absoluto: 58.54911405435064\n",
"Mean Squared Error: 8577.336772824097\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "se95etwAcmAY",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "e81dc8c9-5f4a-4806-bcbd-70f216058ee8"
},
"source": [
"R_2_a = r2_score(y, previsao_arvore) #calcula R2\n",
"\n",
"print(\"Coeficiente de Determinação (R2):\", (R_2_a))"
],
"execution_count": 35,
"outputs": [
{
"output_type": "stream",
"text": [
"Coeficiente de Determinação (R2): 0.7098339715834964\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CDytk1AOjg0F",
"colab_type": "text"
},
"source": [
"#### **Qual *insight* podemos encontrar comparando os valores de R2 encontrado com a regressão linear e com a árvore de decisão?**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6IGzsSd4kFZu",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "7cb5d7b7-4099-4b8b-9e2d-e00a297e115f"
},
"source": [
"print(\"Coeficiente de Determinação (R2) - Regressão:\", R_2)\n",
"print(\"Coeficiente de Determinação (R2) - Árvore de Decisão:\", R_2_a)"
],
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"text": [
"Coeficiente de Determinação (R2) - Regressão: 0.4059859251122173\n",
"Coeficiente de Determinação (R2) - Árvore de Decisão: 0.7098339715834964\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3dy_DHeOsVCb",
"colab_type": "text"
},
"source": [
"Comparando os dois resultandos, podemos chega à conclusão que o valor encontrado pela Árvore de Decisão possui maior valor, se aproximando mais de 1."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment