csmathguy · February 15, 2019 19:53
diff --git a/Segment_Toronto_Neighborhoods_Step2.ipynb b/Segment_Toronto_Neighborhoods_Step2.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>\n",
    "<p>In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.</p>\n",
    "<p>For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.</p>\n",
    "<p>Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Step 1 - Scrap Wikipedia for data using BeautifulSoup</h2>\n",
    "<p>Use the Notebook to build the code to scrape the following Wikipedia page, <a href='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' target='_blank'>https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M</a>, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:</p>\n",
    "<div> \n",
    "    To create the dataframe:\n",
    "    <ul>\n",
    "        <li>The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.</li>\n",
    "        <li>Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.</li>\n",
    "        <li>More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.</li>\n",
    "        <li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.</li>\n",
    "        <li>Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.</li>\n",
    "        <li>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</li>\n",
    "    </ul>\n",
    "</div>\n",
    "\n",
    "<p> Note: There are different website scraping libraries and packages in Python. One of the most common packages is BeautifulSoup. Here is the package's main documentation page: <a href='http://beautiful-soup-4.readthedocs.io/en/latest/' target='_blank'>http://beautiful-soup-4.readthedocs.io/en/latest/</a></p> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: BeautifulSoup4 in /home/jupyterlab/conda/lib/python3.6/site-packages (4.7.1)\n",
      "Requirement already satisfied: soupsieve>=1.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from BeautifulSoup4) (1.7.1)\n",
      "Requirement already satisfied: requests in /home/jupyterlab/conda/lib/python3.6/site-packages (2.21.0)\n",
      "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (3.0.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (2018.11.29)\n",
      "Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (1.24.1)\n",
      "Requirement already satisfied: idna<2.9,>=2.5 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (2.8)\n"
     ]
    }
   ],
   "source": [
    "#install Beautiful Soup and requests for Web Scaping\n",
    "!pip install BeautifulSoup4\n",
    "!pip install requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3> Step 1A: Get Data </h3>\n",
    "<ol>\n",
    "    <li>Get HTML from wikipedia</li>\n",
    "    <li> Use BeautifySoup to parse html data </li>\n",
    "    <li> Store parsed data into Pandas DataFrame </li>\n",
    "</ol>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Postcode</th>\n",
       "      <th>Borough</th>\n",
       "      <th>Neighbourhood</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>M1A</td>\n",
       "      <td>Not assigned</td>\n",
       "      <td>Not assigned</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M2A</td>\n",
       "      <td>Not assigned</td>\n",
       "      <td>Not assigned</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>M3A</td>\n",
       "      <td>North York</td>\n",
       "      <td>Parkwoods</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>M4A</td>\n",
       "      <td>North York</td>\n",
       "      <td>Victoria Village</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>M5A</td>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Harbourfront</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Postcode           Borough     Neighbourhood\n",
       "0      M1A      Not assigned      Not assigned\n",
       "1      M2A      Not assigned      Not assigned\n",
       "2      M3A        North York         Parkwoods\n",
       "3      M4A        North York  Victoria Village\n",
       "4      M5A  Downtown Toronto      Harbourfront"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#imports\n",
    "from bs4 import BeautifulSoup\n",
    "import requests\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "#get html from wiki page and create soup object\n",
    "source = requests.get(\"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M\")\n",
    "soup = BeautifulSoup(source.text, 'lxml')\n",
    "\n",
    "#using soup object, iterate the .wikitable to get the data from the HTML page and store it into a list\n",
    "data = []\n",
    "columns = []\n",
    "table = soup.find(class_='wikitable')\n",
    "for index, tr in enumerate(table.find_all('tr')):\n",
    "    section = []\n",
    "    for td in tr.find_all(['th','td']):\n",
    "        section.append(td.text.rstrip())\n",
    "    \n",
    "    #First row of data is the header\n",
    "    if (index == 0):\n",
    "        columns = section\n",
    "    else:\n",
    "        data.append(section)\n",
    "\n",
    "#convert list into Pandas DataFrame\n",
    "canada_df = pd.DataFrame(data = data,columns = columns)\n",
    "canada_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3>Step 1B: Data Cleanup</h3>\n",
    "<ol>\n",
    "    <li> Remove Boroughs that are 'Not assigned' </li>\n",
    "    <li> More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma</li>\n",
    "    <li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough</li>\n",
    "</ol>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Postcode</th>\n",
       "      <th>Borough</th>\n",
       "      <th>Neighbourhood</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>M3A</td>\n",
       "      <td>North York</td>\n",
       "      <td>Parkwoods</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>M4A</td>\n",
       "      <td>North York</td>\n",
       "      <td>Victoria Village</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>M5A</td>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Harbourfront</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>M5A</td>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Regent Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>M6A</td>\n",
       "      <td>North York</td>\n",
       "      <td>Lawrence Heights</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Postcode           Borough     Neighbourhood\n",
       "2      M3A        North York         Parkwoods\n",
       "3      M4A        North York  Victoria Village\n",
       "4      M5A  Downtown Toronto      Harbourfront\n",
       "5      M5A  Downtown Toronto       Regent Park\n",
       "6      M6A        North York  Lawrence Heights"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Remove Boroughs that are 'Not assigned'\n",
    "canada_df = canada_df[canada_df['Borough'] != 'Not assigned']\n",
    "canada_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Neighbourhood</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Postcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M3A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Parkwoods</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M4A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Victoria Village</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M5A</th>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Harbourfront, Regent Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M6A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Lawrence Heights, Lawrence Manor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M7A</th>\n",
       "      <td>Queen's Park</td>\n",
       "      <td>Not assigned</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Borough                     Neighbourhood\n",
       "Postcode                                                    \n",
       "M3A             North York                         Parkwoods\n",
       "M4A             North York                  Victoria Village\n",
       "M5A       Downtown Toronto         Harbourfront, Regent Park\n",
       "M6A             North York  Lawrence Heights, Lawrence Manor\n",
       "M7A           Queen's Park                      Not assigned"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma\n",
    "canada_df[\"Neighbourhood\"] = canada_df.groupby(\"Postcode\")[\"Neighbourhood\"].transform(lambda neigh: ', '.join(neigh))\n",
    "\n",
    "#remove duplicates\n",
    "canada_df = canada_df.drop_duplicates()\n",
    "\n",
    "#update index to be postcode if it isn't already\n",
    "if(canada_df.index.name != 'Postcode'):\n",
    "    canada_df = canada_df.set_index('Postcode')\n",
    "    \n",
    "canada_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Neighbourhood</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Postcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M3A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Parkwoods</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M4A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Victoria Village</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M5A</th>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Harbourfront, Regent Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M6A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Lawrence Heights, Lawrence Manor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M7A</th>\n",
       "      <td>Queen's Park</td>\n",
       "      <td>Queen's Park</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Borough                     Neighbourhood\n",
       "Postcode                                                    \n",
       "M3A             North York                         Parkwoods\n",
       "M4A             North York                  Victoria Village\n",
       "M5A       Downtown Toronto         Harbourfront, Regent Park\n",
       "M6A             North York  Lawrence Heights, Lawrence Manor\n",
       "M7A           Queen's Park                      Queen's Park"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough\n",
    "canada_df['Neighbourhood'].replace(\"Not assigned\", canada_df[\"Borough\"],inplace=True)\n",
    "canada_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3>Step 1C - Output</h3>\n",
    "<p>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(103, 2)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "canada_df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Step 2 - Add Geospatial Data</h2>\n",
    "<p>Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.</p>\n",
    "<p>We will use a link to a csv file that has the geographical coordinates of each postal code: <a href='http://cocl.us/Geospatial_data' target=\"_blank\">http://cocl.us/Geospatial_data</a> to get the latitude and the longitude coordinates of each neighborhood.</p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Postcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M1B</th>\n",
       "      <td>43.806686</td>\n",
       "      <td>-79.194353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M1C</th>\n",
       "      <td>43.784535</td>\n",
       "      <td>-79.160497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M1E</th>\n",
       "      <td>43.763573</td>\n",
       "      <td>-79.188711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M1G</th>\n",
       "      <td>43.770992</td>\n",
       "      <td>-79.216917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M1H</th>\n",
       "      <td>43.773136</td>\n",
       "      <td>-79.239476</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Latitude  Longitude\n",
       "Postcode                      \n",
       "M1B       43.806686 -79.194353\n",
       "M1C       43.784535 -79.160497\n",
       "M1E       43.763573 -79.188711\n",
       "M1G       43.770992 -79.216917\n",
       "M1H       43.773136 -79.239476"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Get data lat/long data from csv\n",
    "lat_long_coord_df = pd.read_csv(\"Geospatial_Coordinates.csv\")\n",
    "\n",
    "#rename columns and set the index to be Postcode\n",
    "lat_long_coord_df.columns = [\"Postcode\", \"Latitude\", \"Longitude\"]\n",
    "if(lat_long_coord_df.index.name != 'Postcode'):\n",
    "    lat_long_coord_df = lat_long_coord_df.set_index('Postcode')\n",
    "    \n",
    "lat_long_coord_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Borough</th>\n",
       "      <th>Neighbourhood</th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Postcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M3A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Parkwoods</td>\n",
       "      <td>43.753259</td>\n",
       "      <td>-79.329656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M4A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Victoria Village</td>\n",
       "      <td>43.725882</td>\n",
       "      <td>-79.315572</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M5A</th>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Harbourfront, Regent Park</td>\n",
       "      <td>43.654260</td>\n",
       "      <td>-79.360636</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M6A</th>\n",
       "      <td>North York</td>\n",
       "      <td>Lawrence Heights, Lawrence Manor</td>\n",
       "      <td>43.718518</td>\n",
       "      <td>-79.464763</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M7A</th>\n",
       "      <td>Queen's Park</td>\n",
       "      <td>Queen's Park</td>\n",
       "      <td>43.662301</td>\n",
       "      <td>-79.389494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M9A</th>\n",
       "      <td>Etobicoke</td>\n",
       "      <td>Islington Avenue</td>\n",
       "      <td>43.667856</td>\n",
       "      <td>-79.532242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M1B</th>\n",
       "      <td>Scarborough</td>\n",
       "      <td>Rouge, Malvern</td>\n",
       "      <td>43.806686</td>\n",
       "      <td>-79.194353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M3B</th>\n",
       "      <td>North York</td>\n",
       "      <td>Don Mills North</td>\n",
       "      <td>43.745906</td>\n",
       "      <td>-79.352188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M4B</th>\n",
       "      <td>East York</td>\n",
       "      <td>Woodbine Gardens, Parkview Hill</td>\n",
       "      <td>43.706397</td>\n",
       "      <td>-79.309937</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M5B</th>\n",
       "      <td>Downtown Toronto</td>\n",
       "      <td>Ryerson, Garden District</td>\n",
       "      <td>43.657162</td>\n",
       "      <td>-79.378937</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M6B</th>\n",
       "      <td>North York</td>\n",
       "      <td>Glencairn</td>\n",
       "      <td>43.709577</td>\n",
       "      <td>-79.445073</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Borough                     Neighbourhood   Latitude  \\\n",
       "Postcode                                                                  \n",
       "M3A             North York                         Parkwoods  43.753259   \n",
       "M4A             North York                  Victoria Village  43.725882   \n",
       "M5A       Downtown Toronto         Harbourfront, Regent Park  43.654260   \n",
       "M6A             North York  Lawrence Heights, Lawrence Manor  43.718518   \n",
       "M7A           Queen's Park                      Queen's Park  43.662301   \n",
       "M9A              Etobicoke                  Islington Avenue  43.667856   \n",
       "M1B            Scarborough                    Rouge, Malvern  43.806686   \n",
       "M3B             North York                   Don Mills North  43.745906   \n",
       "M4B              East York   Woodbine Gardens, Parkview Hill  43.706397   \n",
       "M5B       Downtown Toronto          Ryerson, Garden District  43.657162   \n",
       "M6B             North York                         Glencairn  43.709577   \n",
       "\n",
       "          Longitude  \n",
       "Postcode             \n",
       "M3A      -79.329656  \n",
       "M4A      -79.315572  \n",
       "M5A      -79.360636  \n",
       "M6A      -79.464763  \n",
       "M7A      -79.389494  \n",
       "M9A      -79.532242  \n",
       "M1B      -79.194353  \n",
       "M3B      -79.352188  \n",
       "M4B      -79.309937  \n",
       "M5B      -79.378937  \n",
       "M6B      -79.445073  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "canada_df = canada_df.join(lat_long_coord_df)\n",
    "canada_df.head(11)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>\n",
	"<p>In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.</p>\n",
	"<p>For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.</p>\n",
	"<p>Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.</p>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h2>Step 1 - Scrap Wikipedia for data using BeautifulSoup</h2>\n",
	"<p>Use the Notebook to build the code to scrape the following Wikipedia page, <a href='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' target='_blank'>https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M</a>, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:</p>\n",
	"<div> \n",
	" To create the dataframe:\n",
	" <ul>\n",
	" <li>The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood.</li>\n",
	" <li>Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.</li>\n",
	" <li>More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.</li>\n",
	" <li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.</li>\n",
	" <li>Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.</li>\n",
	" <li>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</li>\n",
	" </ul>\n",
	"</div>\n",
	"\n",
	"<p> Note: There are different website scraping libraries and packages in Python. One of the most common packages is BeautifulSoup. Here is the package's main documentation page: <a href='http://beautiful-soup-4.readthedocs.io/en/latest/' target='_blank'>http://beautiful-soup-4.readthedocs.io/en/latest/</a></p> "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Requirement already satisfied: BeautifulSoup4 in /home/jupyterlab/conda/lib/python3.6/site-packages (4.7.1)\n",
	"Requirement already satisfied: soupsieve>=1.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from BeautifulSoup4) (1.7.1)\n",
	"Requirement already satisfied: requests in /home/jupyterlab/conda/lib/python3.6/site-packages (2.21.0)\n",
	"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (3.0.4)\n",
	"Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (2018.11.29)\n",
	"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (1.24.1)\n",
	"Requirement already satisfied: idna<2.9,>=2.5 in /home/jupyterlab/conda/lib/python3.6/site-packages (from requests) (2.8)\n"
	]
	}
	],
	"source": [
	"#install Beautiful Soup and requests for Web Scaping\n",
	"!pip install BeautifulSoup4\n",
	"!pip install requests"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h3> Step 1A: Get Data </h3>\n",
	"<ol>\n",
	" <li>Get HTML from wikipedia</li>\n",
	" <li> Use BeautifySoup to parse html data </li>\n",
	" <li> Store parsed data into Pandas DataFrame </li>\n",
	"</ol>"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Postcode</th>\n",
	" <th>Borough</th>\n",
	" <th>Neighbourhood</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>M1A</td>\n",
	" <td>Not assigned</td>\n",
	" <td>Not assigned</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>M2A</td>\n",
	" <td>Not assigned</td>\n",
	" <td>Not assigned</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>M3A</td>\n",
	" <td>North York</td>\n",
	" <td>Parkwoods</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>M4A</td>\n",
	" <td>North York</td>\n",
	" <td>Victoria Village</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>M5A</td>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Harbourfront</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Postcode Borough Neighbourhood\n",
	"0 M1A Not assigned Not assigned\n",
	"1 M2A Not assigned Not assigned\n",
	"2 M3A North York Parkwoods\n",
	"3 M4A North York Victoria Village\n",
	"4 M5A Downtown Toronto Harbourfront"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#imports\n",
	"from bs4 import BeautifulSoup\n",
	"import requests\n",
	"import pandas as pd\n",
	"import numpy as np\n",
	"\n",
	"#get html from wiki page and create soup object\n",
	"source = requests.get(\"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M\")\n",
	"soup = BeautifulSoup(source.text, 'lxml')\n",
	"\n",
	"#using soup object, iterate the .wikitable to get the data from the HTML page and store it into a list\n",
	"data = []\n",
	"columns = []\n",
	"table = soup.find(class_='wikitable')\n",
	"for index, tr in enumerate(table.find_all('tr')):\n",
	" section = []\n",
	" for td in tr.find_all(['th','td']):\n",
	" section.append(td.text.rstrip())\n",
	" \n",
	" #First row of data is the header\n",
	" if (index == 0):\n",
	" columns = section\n",
	" else:\n",
	" data.append(section)\n",
	"\n",
	"#convert list into Pandas DataFrame\n",
	"canada_df = pd.DataFrame(data = data,columns = columns)\n",
	"canada_df.head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h3>Step 1B: Data Cleanup</h3>\n",
	"<ol>\n",
	" <li> Remove Boroughs that are 'Not assigned' </li>\n",
	" <li> More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma</li>\n",
	" <li>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough</li>\n",
	"</ol>"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Postcode</th>\n",
	" <th>Borough</th>\n",
	" <th>Neighbourhood</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>M3A</td>\n",
	" <td>North York</td>\n",
	" <td>Parkwoods</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>M4A</td>\n",
	" <td>North York</td>\n",
	" <td>Victoria Village</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>M5A</td>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Harbourfront</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>5</th>\n",
	" <td>M5A</td>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Regent Park</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>6</th>\n",
	" <td>M6A</td>\n",
	" <td>North York</td>\n",
	" <td>Lawrence Heights</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Postcode Borough Neighbourhood\n",
	"2 M3A North York Parkwoods\n",
	"3 M4A North York Victoria Village\n",
	"4 M5A Downtown Toronto Harbourfront\n",
	"5 M5A Downtown Toronto Regent Park\n",
	"6 M6A North York Lawrence Heights"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Remove Boroughs that are 'Not assigned'\n",
	"canada_df = canada_df[canada_df['Borough'] != 'Not assigned']\n",
	"canada_df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Borough</th>\n",
	" <th>Neighbourhood</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Postcode</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>M3A</th>\n",
	" <td>North York</td>\n",
	" <td>Parkwoods</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M4A</th>\n",
	" <td>North York</td>\n",
	" <td>Victoria Village</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M5A</th>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Harbourfront, Regent Park</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M6A</th>\n",
	" <td>North York</td>\n",
	" <td>Lawrence Heights, Lawrence Manor</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M7A</th>\n",
	" <td>Queen's Park</td>\n",
	" <td>Not assigned</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Borough Neighbourhood\n",
	"Postcode \n",
	"M3A North York Parkwoods\n",
	"M4A North York Victoria Village\n",
	"M5A Downtown Toronto Harbourfront, Regent Park\n",
	"M6A North York Lawrence Heights, Lawrence Manor\n",
	"M7A Queen's Park Not assigned"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# More than one neighborhood can exist in one postal code area, combined these into one row with the neighborhoods separated with a comma\n",
	"canada_df[\"Neighbourhood\"] = canada_df.groupby(\"Postcode\")[\"Neighbourhood\"].transform(lambda neigh: ', '.join(neigh))\n",
	"\n",
	"#remove duplicates\n",
	"canada_df = canada_df.drop_duplicates()\n",
	"\n",
	"#update index to be postcode if it isn't already\n",
	"if(canada_df.index.name != 'Postcode'):\n",
	" canada_df = canada_df.set_index('Postcode')\n",
	" \n",
	"canada_df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Borough</th>\n",
	" <th>Neighbourhood</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Postcode</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>M3A</th>\n",
	" <td>North York</td>\n",
	" <td>Parkwoods</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M4A</th>\n",
	" <td>North York</td>\n",
	" <td>Victoria Village</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M5A</th>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Harbourfront, Regent Park</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M6A</th>\n",
	" <td>North York</td>\n",
	" <td>Lawrence Heights, Lawrence Manor</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M7A</th>\n",
	" <td>Queen's Park</td>\n",
	" <td>Queen's Park</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Borough Neighbourhood\n",
	"Postcode \n",
	"M3A North York Parkwoods\n",
	"M4A North York Victoria Village\n",
	"M5A Downtown Toronto Harbourfront, Regent Park\n",
	"M6A North York Lawrence Heights, Lawrence Manor\n",
	"M7A Queen's Park Queen's Park"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough\n",
	"canada_df['Neighbourhood'].replace(\"Not assigned\", canada_df[\"Borough\"],inplace=True)\n",
	"canada_df.head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h3>Step 1C - Output</h3>\n",
	"<p>In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</p>"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"(103, 2)"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"canada_df.shape"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<h2> Step 2 - Add Geospatial Data</h2>\n",
	"<p>Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.</p>\n",
	"<p>We will use a link to a csv file that has the geographical coordinates of each postal code: <a href='http://cocl.us/Geospatial_data' target=\"_blank\">http://cocl.us/Geospatial_data</a> to get the latitude and the longitude coordinates of each neighborhood.</p>\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Latitude</th>\n",
	" <th>Longitude</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Postcode</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>M1B</th>\n",
	" <td>43.806686</td>\n",
	" <td>-79.194353</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M1C</th>\n",
	" <td>43.784535</td>\n",
	" <td>-79.160497</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M1E</th>\n",
	" <td>43.763573</td>\n",
	" <td>-79.188711</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M1G</th>\n",
	" <td>43.770992</td>\n",
	" <td>-79.216917</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M1H</th>\n",
	" <td>43.773136</td>\n",
	" <td>-79.239476</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Latitude Longitude\n",
	"Postcode \n",
	"M1B 43.806686 -79.194353\n",
	"M1C 43.784535 -79.160497\n",
	"M1E 43.763573 -79.188711\n",
	"M1G 43.770992 -79.216917\n",
	"M1H 43.773136 -79.239476"
	]
	},
	"execution_count": 14,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Get data lat/long data from csv\n",
	"lat_long_coord_df = pd.read_csv(\"Geospatial_Coordinates.csv\")\n",
	"\n",
	"#rename columns and set the index to be Postcode\n",
	"lat_long_coord_df.columns = [\"Postcode\", \"Latitude\", \"Longitude\"]\n",
	"if(lat_long_coord_df.index.name != 'Postcode'):\n",
	" lat_long_coord_df = lat_long_coord_df.set_index('Postcode')\n",
	" \n",
	"lat_long_coord_df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Borough</th>\n",
	" <th>Neighbourhood</th>\n",
	" <th>Latitude</th>\n",
	" <th>Longitude</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Postcode</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>M3A</th>\n",
	" <td>North York</td>\n",
	" <td>Parkwoods</td>\n",
	" <td>43.753259</td>\n",
	" <td>-79.329656</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M4A</th>\n",
	" <td>North York</td>\n",
	" <td>Victoria Village</td>\n",
	" <td>43.725882</td>\n",
	" <td>-79.315572</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M5A</th>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Harbourfront, Regent Park</td>\n",
	" <td>43.654260</td>\n",
	" <td>-79.360636</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M6A</th>\n",
	" <td>North York</td>\n",
	" <td>Lawrence Heights, Lawrence Manor</td>\n",
	" <td>43.718518</td>\n",
	" <td>-79.464763</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M7A</th>\n",
	" <td>Queen's Park</td>\n",
	" <td>Queen's Park</td>\n",
	" <td>43.662301</td>\n",
	" <td>-79.389494</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M9A</th>\n",
	" <td>Etobicoke</td>\n",
	" <td>Islington Avenue</td>\n",
	" <td>43.667856</td>\n",
	" <td>-79.532242</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M1B</th>\n",
	" <td>Scarborough</td>\n",
	" <td>Rouge, Malvern</td>\n",
	" <td>43.806686</td>\n",
	" <td>-79.194353</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M3B</th>\n",
	" <td>North York</td>\n",
	" <td>Don Mills North</td>\n",
	" <td>43.745906</td>\n",
	" <td>-79.352188</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M4B</th>\n",
	" <td>East York</td>\n",
	" <td>Woodbine Gardens, Parkview Hill</td>\n",
	" <td>43.706397</td>\n",
	" <td>-79.309937</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M5B</th>\n",
	" <td>Downtown Toronto</td>\n",
	" <td>Ryerson, Garden District</td>\n",
	" <td>43.657162</td>\n",
	" <td>-79.378937</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>M6B</th>\n",
	" <td>North York</td>\n",
	" <td>Glencairn</td>\n",
	" <td>43.709577</td>\n",
	" <td>-79.445073</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Borough Neighbourhood Latitude \\\n",
	"Postcode \n",
	"M3A North York Parkwoods 43.753259 \n",
	"M4A North York Victoria Village 43.725882 \n",
	"M5A Downtown Toronto Harbourfront, Regent Park 43.654260 \n",
	"M6A North York Lawrence Heights, Lawrence Manor 43.718518 \n",
	"M7A Queen's Park Queen's Park 43.662301 \n",
	"M9A Etobicoke Islington Avenue 43.667856 \n",
	"M1B Scarborough Rouge, Malvern 43.806686 \n",
	"M3B North York Don Mills North 43.745906 \n",
	"M4B East York Woodbine Gardens, Parkview Hill 43.706397 \n",
	"M5B Downtown Toronto Ryerson, Garden District 43.657162 \n",
	"M6B North York Glencairn 43.709577 \n",
	"\n",
	" Longitude \n",
	"Postcode \n",
	"M3A -79.329656 \n",
	"M4A -79.315572 \n",
	"M5A -79.360636 \n",
	"M6A -79.464763 \n",
	"M7A -79.389494 \n",
	"M9A -79.532242 \n",
	"M1B -79.194353 \n",
	"M3B -79.352188 \n",
	"M4B -79.309937 \n",
	"M5B -79.378937 \n",
	"M6B -79.445073 "
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"canada_df = canada_df.join(lat_long_coord_df)\n",
	"canada_df.head(11)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.8"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}