Skip to content

Instantly share code, notes, and snippets.

@PatrickRWright
Created June 27, 2019 15:37
Show Gist options
  • Save PatrickRWright/4e909feae4098cbd266773be05ac760d to your computer and use it in GitHub Desktop.
Save PatrickRWright/4e909feae4098cbd266773be05ac760d to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Capstone final"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A description of the problem and a discussion of the background. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You are living in one of the 80 biggest German cities. You love your city but because you are looking for a professional change in your life you may have to relocate. Now you will rank the top 80 cities based similarity to your current city to make an informed decision."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your city is: Frankfurt am Main"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"yourCity = \"Frankfurt am Main\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A description of the data and how it will be used to solve the problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The list of the biggest German cities can be retrieved from Wikipedia: https://de.wikipedia.org/wiki/Liste_der_Gro%C3%9Fst%C3%A4dte_in_Deutschland \n",
"The table on the Wikipedia page already contains the key variables that define a city. These are total population, population density, population change and federal region. Using foursquare this dataset will be extended to count the cafes in the city centers because you love cafes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"import python libraries"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# libs\n",
"import pandas as pd\n",
"from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe\n",
"import json # library to handle JSON files\n",
"import folium # map rendering library\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from geopy.geocoders import Nominatim # convert an address into latitude and longitude values\n",
"import requests # library to handle requests\n",
"import re"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"define useful functions"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# functions\n",
"\n",
"# examples\n",
"# LonLatFinder(\"Berlin\")\n",
"# LonLatFinder(\"München\")\n",
"def LonLatFinder(address):\n",
" geolocator = Nominatim(user_agent=\"explorer\")\n",
" location = geolocator.geocode(address)\n",
" latitude = location.latitude\n",
" longitude = location.longitude\n",
" return([longitude, latitude])\n",
"\n",
"# get the category name from\n",
"# a foursquare query json\n",
"# returns a list\n",
"def CategoryName(json):\n",
" venueTypes = []\n",
" for ven in json['response']['venues']:\n",
" try:\n",
" #print(ven['categories'][0]['name'])\n",
" venueTypes.append(ven['categories'][0]['name'])\n",
" except:\n",
" pass\n",
" return(venueTypes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"set foursquare parameters (I will not publicly show my ID and SECRET)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# foursquare setup\n",
"CLIENTID = \"\"\n",
"CLIENTSECRET = \"\"\n",
"VERSION = '20180605' # Foursquare API version\n",
"LIMIT = 1000\n",
"radius = 500"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"get table from wikipedia"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-06-27 11:40:09-- https://en.wikipedia.org/wiki/List_of_cities_in_Germany_by_population\n",
"Resolving en.wikipedia.org (en.wikipedia.org)... 208.80.154.224, 2620:0:861:ed1a::1\n",
"Connecting to en.wikipedia.org (en.wikipedia.org)|208.80.154.224|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 249657 (244K) [text/html]\n",
"Saving to: ‘List_of_cities_in_Germany_by_population.2’\n",
"\n",
"List_of_cities_in_G 100%[===================>] 243.81K --.-KB/s in 0.08s \n",
"\n",
"2019-06-27 11:40:09 (2.86 MB/s) - ‘List_of_cities_in_Germany_by_population.2’ saved [249657/249657]\n",
"\n",
"--2019-06-27 11:40:09-- https://de.wikipedia.org/wiki/Liste_der_Gro%C3%9Fst%C3%A4dte_in_Deutschland\n",
"Resolving de.wikipedia.org (de.wikipedia.org)... 208.80.154.224, 2620:0:861:ed1a::1\n",
"Connecting to de.wikipedia.org (de.wikipedia.org)|208.80.154.224|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 245283 (240K) [text/html]\n",
"Saving to: ‘Liste_der_Großstädte_in_Deutschland’\n",
"\n",
"Liste_der_Großstädt 100%[===================>] 239.53K --.-KB/s in 0.08s \n",
"\n",
"2019-06-27 11:40:10 (2.93 MB/s) - ‘Liste_der_Großstädte_in_Deutschland’ saved [245283/245283]\n",
"\n"
]
}
],
"source": [
"# external data\n",
"!wget https://de.wikipedia.org/wiki/Liste_der_Gro%C3%9Fst%C3%A4dte_in_Deutschland"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"prepare dataframe for analysis"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Berlin\n",
"Hamburg\n",
"München\n",
"Köln\n",
"Frankfurt am Main\n",
"Stuttgart\n",
"Düsseldorf\n",
"Dortmund\n",
"Essen\n",
"Leipzig\n",
"Bremen\n",
"Dresden\n",
"Hannover\n",
"Nürnberg\n",
"Duisburg\n",
"Bochum\n",
"Wuppertal\n",
"Bielefeld\n",
"Bonn\n",
"Münster\n",
"Karlsruhe\n",
"Mannheim\n",
"Augsburg\n",
"Wiesbaden\n",
"Mönchengladbach \n",
"Gelsenkirchen\n",
"Braunschweig\n",
"Kiel\n",
"Chemnitz\n",
"Aachen\n",
"Halle (Saale)\n",
"Magdeburg\n",
"Freiburg im Breisgau\n",
"Krefeld\n",
"Lübeck\n",
"Mainz\n",
"Erfurt\n",
"Oberhausen\n",
"Rostock\n",
"Kassel\n",
"Hagen\n",
"Saarbrücken\n",
"Hamm\n",
"Potsdam\n",
"Mülheim an der Ruhr\n",
"Ludwigshafen am Rhein\n",
"Oldenburg (Oldb)\n",
"Osnabrück\n",
"Leverkusen\n",
"Heidelberg\n",
"Solingen\n",
"Darmstadt\n",
"Herne\n",
"Neuss\n",
"Regensburg\n",
"Paderborn\n",
"Ingolstadt\n",
"Offenbach am Main\n",
"Würzburg\n",
"Fürth\n",
"Ulm\n",
"Heilbronn\n",
"Pforzheim\n",
"Wolfsburg\n",
"Göttingen\n",
"Bottrop\n",
"Reutlingen\n",
"Koblenz\n",
"Recklinghausen\n",
"Bremerhaven \n",
"Bergisch Gladbach\n",
"Jena\n",
"Erlangen\n",
"Remscheid\n",
"Trier\n",
"Salzgitter \n",
"Moers\n",
"Siegen\n",
"Hildesheim\n",
"Cottbus\n"
]
}
],
"source": [
"# get data from wikipedia\n",
"X = 80\n",
"cities = pd.read_html(\"Liste_der_Großstädte_in_Deutschland\")[0]\n",
"topX = cities.head(X)\n",
"\n",
"# build pandas df\n",
"cities_data = {'name': topX['Name']['Name'],\n",
" 'totalPop2017': topX['Einwohnerzahl']['2017'], \n",
" 'area': topX['Flächein km²(2016)']['Flächein km²(2016)'],\n",
" 'popPerArea': topX['Ew./km²(2016)']['Ew./km²(2016)'],\n",
" 'popChange': topX['Be­völ­ke­rungs­ent­wick­lung [%] (2017 ggü. 2016)']['Be­völ­ke­rungs­ent­wick­lung [%] (2017 ggü. 2016)'],\n",
" 'province': topX['Bun­des­land']['Bun­des­land']}\n",
"cities_df = pd.DataFrame(data=cities_data)\n",
"\n",
"# data curation\n",
"for x in cities_df['popChange']:\n",
" x = int(x)/100\n",
"cities_df['name'] = cities_df['name'].str.replace('\\d', '').str.replace(',', '').str.replace('­', '')\n",
"cities_df['popChange'] = cities_df['popChange'].apply(int)/100\n",
"cities_df['totalPop2017'] = cities_df['totalPop2017'].str.replace('\\.', '').apply(int)\n",
"cities_df['area'] = cities_df['area'].apply(int)\n",
"cities_df['popPerArea'] = cities_df['popPerArea'].str.replace('\\.', '').apply(int)\n",
"\n",
"# add cafe count using foursquare\n",
"cafe_count_all = []\n",
"for city in cities_df[\"name\"].iteritems():\n",
" CurrCity = city[1]\n",
" #print(CurrCity)\n",
" LonLatCurrCity = LonLatFinder(CurrCity)\n",
" lng = LonLatCurrCity[0]\n",
" lat = LonLatCurrCity[1]\n",
" url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENTID, \n",
" CLIENTSECRET, \n",
" lat, \n",
" lng, \n",
" VERSION, \n",
" radius, \n",
" LIMIT)\n",
" results = requests.get(url).json()\n",
" catNames = CategoryName(results)\n",
" cafe_count = catNames.count('Café')\n",
" cafe_count_all.append(cafe_count)\n",
"\n",
"# add to cities_df \n",
"cities_df['cafeCount'] = cafe_count_all"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"prevent overweighting of bigger numbers"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/lib/python3.6/site-packages/sklearn/preprocessing/data.py:323: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by MinMaxScaler.\n",
" return self.partial_fit(X, y)\n"
]
}
],
"source": [
"# Normalize totalPop2017, area, popPerArea, popChange, cafeCount with MinMaxScaler\n",
"# to prevent overweighting of bigger numbers\n",
"\n",
"scaler = MinMaxScaler()\n",
"df_scaled = cities_df[['totalPop2017', \"area\", \"popPerArea\", \"popChange\", \"cafeCount\"]]\n",
"df_scaled = pd.DataFrame(scaler.fit_transform(df_scaled), columns=df_scaled.columns)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"calculate the euclidean distance"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [],
"source": [
"# calculate euclidean distance to Frankfurt\n",
"yourCityIdx = np.where(cities_df['name'] == yourCity)[0][0]\n",
"yourCityLine = df_scaled.iloc[[yourCityIdx]].values\n",
"\n",
"allDists = []\n",
"for row in df_scaled.iterrows():\n",
" #print(row[1].values)\n",
" #print(frankfurtLine)\n",
" eucDist = np.linalg.norm(row[1].values - yourCityLine)\n",
" # exception for 0 distance because this is the town itself\n",
" if eucDist == 0:\n",
" eucDist = 10\n",
" allDists.append(eucDist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"find minimum distance to your town"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>totalPop2017</th>\n",
" <th>area</th>\n",
" <th>popPerArea</th>\n",
" <th>popChange</th>\n",
" <th>province</th>\n",
" <th>cafeCount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Offenbach am Main</td>\n",
" <td>126658</td>\n",
" <td>4489</td>\n",
" <td>2775</td>\n",
" <td>1.66</td>\n",
" <td>Hessen</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name totalPop2017 area popPerArea popChange province \\\n",
"57 Offenbach am Main 126658 4489 2775 1.66 Hessen \n",
"\n",
" cafeCount \n",
"57 9 "
]
},
"execution_count": 151,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df.iloc[[np.where(min(allDists) == allDists)[0][0]]]"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>totalPop2017</th>\n",
" <th>area</th>\n",
" <th>popPerArea</th>\n",
" <th>popChange</th>\n",
" <th>province</th>\n",
" <th>cafeCount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Frankfurt am Main</td>\n",
" <td>746878</td>\n",
" <td>24831</td>\n",
" <td>2966</td>\n",
" <td>1.42</td>\n",
" <td>Hessen</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name totalPop2017 area popPerArea popChange province \\\n",
"4 Frankfurt am Main 746878 24831 2966 1.42 Hessen \n",
"\n",
" cafeCount \n",
"4 10 "
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df.iloc[[np.where(cities_df[\"name\"] == yourCity)[0][0]]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"folium map of germany"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [],
"source": [
"LonLat = LonLatFinder(\"Germany\")\n",
"longitude = LonLat[0]\n",
"latitude = LonLat[1]\n",
"TheMap = folium.Map(location=[latitude, longitude], zoom_start=6)\n",
"\n",
"MaxSize = max(cities_df[\"totalPop2017\"])\n",
"\n",
"lats = []\n",
"lons = []\n",
"\n",
"# loop the cities and show on folium map\n",
"for line in cities_df[[\"name\", \"totalPop2017\"]].iterrows():\n",
" CurrCity = line[1][\"name\"]\n",
" CurrSize = line[1][\"totalPop2017\"]\n",
" #print(CurrCity)\n",
" LonLatCurrCity = LonLatFinder(CurrCity)\n",
" lng = LonLatCurrCity[0]\n",
" lat = LonLatCurrCity[1]\n",
" lons.append(lng)\n",
" lats.append(lat)\n",
" label = '{}, Inhabitants (2017) {}'.format(CurrCity, CurrSize)\n",
" label = folium.Popup(label, parse_html=True)\n",
" if CurrSize > 500000:\n",
" CurrCol = 'red'\n",
" else:\n",
" CurrCol = 'blue'\n",
" folium.CircleMarker(\n",
" [lat, lng],\n",
" radius=(CurrSize/MaxSize)*15,\n",
" popup=label,\n",
" color=CurrCol,\n",
" fill=True,\n",
" fill_color='#3186cc',\n",
" fill_opacity=0.7,\n",
" parse_html=False).add_to(TheMap)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"find density center with a 2D histogram"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"51.159399300000004\n",
"6.9090488699999995\n"
]
}
],
"source": [
"# add densest center\n",
"# x are lats, y are lons\n",
"hist, x, y = np.histogram2d(lats, lons)\n",
"max_loc_hist = np.unravel_index(hist.argmax(), hist.shape)\n",
"lat_densest = x[max_loc_hist[0]]\n",
"lon_densest = y[max_loc_hist[1]]\n",
"\n",
"folium.CircleMarker(\n",
" [lat_densest, lon_densest],\n",
" radius=45,\n",
" #popup=label,\n",
" color=\"yellow\",\n",
" fill=True,\n",
" fill_color='yellow',\n",
" fill_opacity=0.7,\n",
" parse_html=False).add_to(TheMap)\n",
"\n",
"print(lat_densest)\n",
"print(lon_densest)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Results "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The map of Germany and where the cities are located. The circle size is proportionate to the population and cities with more than 500.000 inhabitants are indicated in red. The area with the highest city density is circled in yellow. It is called the Ruhrgebiet. While we see that e.g. Berlin, Munich and Hamburg are all bigger than the individual cities in the Ruhrgebiet, the Ruhrgebiet as a metropolitan area is the most densely populated area in Germany."
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7f173eee57b8>"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"TheMap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the map will not render in the gist I made a screenshot which I am linking."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](https://raw.githubusercontent.com/PatrickRWright/Coursera_Capstone/master/mapDE_all.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finding the best match compared to your city of choice. Shape of the dataframe. As you can see we have 80 cities and five properties to classify them by. "
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(80, 7)"
]
},
"execution_count": 157,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>totalPop2017</th>\n",
" <th>area</th>\n",
" <th>popPerArea</th>\n",
" <th>popChange</th>\n",
" <th>province</th>\n",
" <th>cafeCount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Berlin</td>\n",
" <td>3613495</td>\n",
" <td>89168</td>\n",
" <td>4009</td>\n",
" <td>1.08</td>\n",
" <td>Berlin</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Hamburg</td>\n",
" <td>1830584</td>\n",
" <td>75522</td>\n",
" <td>2397</td>\n",
" <td>1.11</td>\n",
" <td>Hamburg</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>München</td>\n",
" <td>1456039</td>\n",
" <td>31070</td>\n",
" <td>4713</td>\n",
" <td>-0.56</td>\n",
" <td>Bayern</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Köln</td>\n",
" <td>1080394</td>\n",
" <td>40502</td>\n",
" <td>2656</td>\n",
" <td>0.41</td>\n",
" <td>Nordrhein-Westfalen</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Frankfurt am Main</td>\n",
" <td>746878</td>\n",
" <td>24831</td>\n",
" <td>2966</td>\n",
" <td>1.42</td>\n",
" <td>Hessen</td>\n",
" <td>10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name totalPop2017 area popPerArea popChange \\\n",
"0 Berlin 3613495 89168 4009 1.08 \n",
"1 Hamburg 1830584 75522 2397 1.11 \n",
"2 München 1456039 31070 4713 -0.56 \n",
"3 Köln 1080394 40502 2656 0.41 \n",
"4 Frankfurt am Main 746878 24831 2966 1.42 \n",
"\n",
" province cafeCount \n",
"0 Berlin 3 \n",
"1 Hamburg 8 \n",
"2 Bayern 14 \n",
"3 Nordrhein-Westfalen 2 \n",
"4 Hessen 10 "
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Distributions of scaled data"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([11., 8., 16., 10., 7., 15., 2., 7., 1., 3.]),\n",
" array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),\n",
" <a list of 10 Patch objects>)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADtRJREFUeJzt3X+MZWddx/H3hy61ikBbdtpsWtYtyWK3gUDJpCkhQegiIWC6/aMlRdHVbNwUC8FgIqv8If5ILCaCGDeWDUUGw4+uVdwNIlq3bVBCC1Nb6I8tttRaNl26g7QVNAILX/+4B9y0M71nZu6PmWffr2Ryzzn3uft8n72zn3n2ueecSVUhSVr/njHtAiRJo2GgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhqxYZKdbdy4sbZs2TLJLiVp3bv99tu/UVUzw9pNNNC3bNnC/Pz8JLuUpHUvyX/0aeeSiyQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNWKiV4quxpY9fzeVfh+65g1T6VeSlssZuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1IhegZ7k9CQ3JLkvyeEkL09yZpIbk9zfPZ4x7mIlSUvrO0N/P/CZqjofeAlwGNgDHKqqrcChbl+SNCVDAz3Jc4BXAtcBVNV3q+pxYAcw1zWbAy4bV5GSpOH6zNBfACwAf5HkjiQfTPIs4OyqOgrQPZ41xjolSUP0CfQNwMuAP6+qC4H/ZhnLK0l2J5lPMr+wsLDCMiVJw/QJ9CPAkaq6rdu/gUHAP5pkE0D3eGyxF1fVvqqararZmZmZUdQsSVrE0ECvqq8DX0vy092h7cC9wEFgZ3dsJ3BgLBVKknrp+yvo3gZ8NMmpwIPArzD4YbA/yS7gYeCK8ZQoSeqjV6BX1Z3A7CJPbR9tOZKklfJKUUlqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIasaFPoyQPAd8Cvg8cr6rZJGcC1wNbgIeAN1bVY+MpU5I0zHJm6K+uqpdW1Wy3vwc4VFVbgUPdviRpSlaz5LIDmOu254DLVl+OJGml+gZ6Af+Y5PYku7tjZ1fVUYDu8axxFChJ6qfXGjrwiqp6JMlZwI1J7uvbQfcDYDfA5s2bV1CiJKmPXjP0qnqkezwGfBK4CHg0ySaA7vHYEq/dV1WzVTU7MzMzmqolSU8xNNCTPCvJs3+4DbwWuBs4COzsmu0EDoyrSEnScH2WXM4GPpnkh+0/VlWfSfJFYH+SXcDDwBXjK1OSNMzQQK+qB4GXLHL8P4Ht4yhKkrR8XikqSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiN6B3qSU5LckeRT3f55SW5Lcn+S65OcOr4yJUnDLGeG/nbg8An77wHeV1VbgceAXaMsTJK0PL0CPcm5wBuAD3b7AS4BbuiazAGXjaNASVI/fWfofwL8JvCDbv95wONVdbzbPwKcs9gLk+xOMp9kfmFhYVXFSpKWNjTQk/wccKyqbj/x8CJNa7HXV9W+qpqtqtmZmZkVlilJGmZDjzavAC5N8nrgNOA5DGbspyfZ0M3SzwUeGV+ZkqRhhs7Qq+q3qurcqtoCXAncVFW/ANwMXN412wkcGFuVkqShVnMe+juBdyR5gMGa+nWjKUmStBJ9llx+pKpuAW7pth8ELhp9SZKklfBKUUlqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGrGs89Cn6aHTfn5KPT8xpX4laXmcoUtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktSIoYGe5LQkX0jypST3JPnd7vh5SW5Lcn+S65OcOv5yJUlL6TND/w5wSVW9BHgp8LokFwPvAd5XVVuBx4Bd4ytTkjTM0ECvgW93u8/svgq4BLihOz4HXDaWCiVJvfRaQ09ySpI7gWPAjcBXgcer6njX5AhwzhKv3Z1kPsn8wsLCKGqWJC2iV6BX1fer6qXAucBFwLbFmi3x2n1VNVtVszMzMyuvVJL0tJZ1lktVPQ7cAlwMnJ7kh79k+lzgkdGWJklajj5nucwkOb3b/nHgNcBh4Gbg8q7ZTuDAuIqUJA23YXgTNgFzSU5h8ANgf1V9Ksm9wCeS/AFwB3DdGOuUJA0xNNCr6svAhYscf5DBerokaQ3wSlFJaoSBLkmNMNAlqREGuiQ1os9ZLie3dz93in0/Mb2+Ja07ztAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AhvzjXEi8/bPLW+75paz9Nz+PxtU+l3232Hp9KvNErO0CWpEQa6JDXCQJekRhjoktSIoYGe5PlJbk5yOMk9Sd7eHT8zyY1J7u8ezxh/uZKkpfSZoR8HfqOqtgEXA1cnuQDYAxyqqq3AoW5fkjQlQwO9qo5W1b92298CDgPnADuAua7ZHHDZuIqUJA23rDX0JFuAC4HbgLOr6igMQh84a9TFSZL66x3oSX4S+Gvg16vqv5bxut1J5pPMLywsrKRGSVIPvQI9yTMZhPlHq+pvusOPJtnUPb8JOLbYa6tqX1XNVtXszMzMKGqWJC2iz1kuAa4DDlfVe0946iCws9veCRwYfXmSpL763MvlFcAvAnclubM79tvANcD+JLuAh4ErxlOiJKmPoYFeVf8CZImnt4+2HEnSSnmlqCQ1wkCXpEYY6JLUCANdkhphoEtSI/wVdGvYi+dePO0SJm7/tAuQ1jFn6JLUCANdkhrhkssadtXn3z+Vfq99+dun0q+k1XGGLkmNMNAlqREuuUjA3qtumlrfV197ydT6VlucoUtSIwx0SWqESy56iv1/eHzaJUhaAWfoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYMDfQkH0pyLMndJxw7M8mNSe7vHs8Yb5mSpGH6XCn6YeDPgI+ccGwPcKiqrkmyp9t/5+jL+397v/7Jcf7xSzvPe4NLWh+GztCr6rPAN590eAcw123PAZeNuC5J0jKtdA397Ko6CtA9njW6kiRJKzH2m3Ml2Q3sBti8efO4u9MI3PSqvVPr+5Jbrp5a39J6t9IZ+qNJNgF0j8eWalhV+6pqtqpmZ2ZmVtidJGmYlQb6QWBnt70TODCaciRJKzV0ySXJx4FXARuTHAF+B7gG2J9kF/AwcMU4i5Radvj8bVPpd9t9h6fSr8ZnaKBX1ZuWeGr7iGuRJK2CV4pKUiP8FXRDXPX590+7BEnqxRm6JDXCQJekRrjkojVlmhc1nWz2XnXT1Pq++tpLptZ3y5yhS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakR3pxL0kljWjckm9TNyJyhS1IjDHRJaoRLLtKUnYz3gD98/rbpdNz437UzdElqhIEuSY1YVaAneV2SryR5IMmeURUlSVq+Fa+hJzkF2Av8LHAE+GKSg1V176iKk9Smk/Fzg0lYzQz9IuCBqnqwqr4LfALYMZqyJEnLtZpAPwf42gn7R7pjkqQpWM1pi1nkWD2lUbIb2N3tfjvJV1bY30bgGyt87XrlmE8Ojrlxb/3Aqsf7U30arSbQjwDPP2H/XOCRJzeqqn3AvlX0A0CS+aqaXe2fs5445pODY27fpMa7miWXLwJbk5yX5FTgSuDgaMqSJC3XimfoVXU8yVuBfwBOAT5UVfeMrDJJ0rKs6tL/qvo08OkR1TLMqpdt1iHHfHJwzO2byHhT9ZTPMSVJ65CX/ktSI9ZcoA+7nUCSH0tyfff8bUm2TL7K0eox5nckuTfJl5McStLrFKa1rO9tI5JcnqSSrOszIvqMN8kbu/f5niQfm3SNo9bj+3pzkpuT3NF9b79+GnWOUpIPJTmW5O4lnk+SP+3+Tr6c5GUjLaCq1swXgw9Xvwq8ADgV+BJwwZPa/Bpwbbd9JXD9tOuewJhfDfxEt/2Wk2HMXbtnA58FbgVmp133mN/jrcAdwBnd/lnTrnsCY94HvKXbvgB4aNp1j2DcrwReBty9xPOvB/6ewXU8FwO3jbL/tTZD73M7gR3AXLd9A7A9yWIXOa0XQ8dcVTdX1f90u7cyOOd/Pet724jfB/4I+N9JFjcGfcb7q8DeqnoMoKqOTbjGUesz5gKe020/l0WuY1lvquqzwDefpskO4CM1cCtwepJNo+p/rQV6n9sJ/KhNVR0HngCeN5HqxmO5t1DYxeAn/Ho2dMxJLgSeX1WfmmRhY9LnPX4h8MIkn0tya5LXTay68egz5ncDb05yhMHZcm+bTGlTNdZbpqy131jU53YCvW45sI70Hk+SNwOzwM+MtaLxe9oxJ3kG8D7glydV0Jj1eY83MFh2eRWD/4H9c5IXVdXjY65tXPqM+U3Ah6vqj5O8HPjLbsw/GH95UzPW/FprM/Q+txP4UZskGxj8V+3p/ouz1vW6hUKS1wDvAi6tqu9MqLZxGTbmZwMvAm5J8hCDtcaD6/iD0b7f1weq6ntV9e/AVxgE/HrVZ8y7gP0AVfV54DQG93hpWa9/7yu11gK9z+0EDgI7u+3LgZuq+7RhnRo65m754QMMwny9r63CkDFX1RNVtbGqtlTVFgafG1xaVfPTKXfV+nxf/y2DD79JspHBEsyDE61ytPqM+WFgO0CSbQwCfWGiVU7eQeCXurNdLgaeqKqjI/vTp/2p8BKfAv8bg0/I39Ud+z0G/6Bh8Kb/FfAA8AXgBdOueQJj/ifgUeDO7uvgtGse95if1PYW1vFZLj3f4wDvBe4F7gKunHbNExjzBcDnGJwBcyfw2mnXPIIxfxw4CnyPwWx8F3AVcNUJ7/Pe7u/krlF/X3ulqCQ1Yq0tuUiSVshAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEf8HKsPmhsNk/R8AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.hist(df_scaled['totalPop2017'])\n",
"plt.hist(df_scaled['area'])\n",
"plt.hist(df_scaled['popPerArea'])\n",
"plt.hist(df_scaled['popChange'])\n",
"plt.hist(df_scaled['cafeCount'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The best match for Frankfurt am Main is Offenbach. Interestingly they are located exaclty next to each other and have a rivalry. Maybe they are more similar than they thought."
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>totalPop2017</th>\n",
" <th>area</th>\n",
" <th>popPerArea</th>\n",
" <th>popChange</th>\n",
" <th>province</th>\n",
" <th>cafeCount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Offenbach am Main</td>\n",
" <td>126658</td>\n",
" <td>4489</td>\n",
" <td>2775</td>\n",
" <td>1.66</td>\n",
" <td>Hessen</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name totalPop2017 area popPerArea popChange province \\\n",
"57 Offenbach am Main 126658 4489 2775 1.66 Hessen \n",
"\n",
" cafeCount \n",
"57 9 "
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df.iloc[[np.where(min(allDists) == allDists)[0][0]]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](https://raw.githubusercontent.com/PatrickRWright/Coursera_Capstone/master/mapHessen.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Discussion & Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above analysis shows how a comparatively small set of variables can serve to compare cities with each other. It also shows that if you are looking to move into an area which is densely populated you do not necessarily need to move to another big town with over 500.000 inhabitants but that you may also like the Ruhrgebiet."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment