Skip to content

Instantly share code, notes, and snippets.

@cmrivers
Created November 3, 2015 17:00
Show Gist options
  • Save cmrivers/ec7955d06d9f39ec3c00 to your computer and use it in GitHub Desktop.
Save cmrivers/ec7955d06d9f39ec3c00 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:e84e57889637dd138adaa0222510c019330e34f0d641d61e0eda7eaf61a68bba"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Python for Data analysis\n",
"\n",
"###Lesson 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we did last time:\n",
"- Getting a handle on your data\n",
"- Replacing values\n",
"- Summary statistics, with some stratification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we will do today:\n",
"- More advanced versions of all of those things"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we will do next time:\n",
"- Traditional programming in Python\n",
"- Control flows, functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Always import the packages you will rely on first. Pandas and %pylab inline is a must. Numpy and seaborn are helpful."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd #core data analysis package\n",
"import numpy as np #also a useful data analysis package\n",
"import seaborn as sns #make your plots pretty\n",
"#make your plots appear on screen\n",
"%pylab inline "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data = pd.read_csv('C:/Users/caitlin.rivers/Downloads/cases.csv')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Like last time, we need to convert age from a string to a number. Normally you would just re-run your code (or pull up cleaned data that you saved previously), but I'm starting a new notebook to keep it clean."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age = data.age.convert_objects(convert_numeric=True)\n",
"data.gender = data.gender.replace(['?', 'M?'], np.nan)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# I. Fancy data manipulation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make a list of where you want your breaks to be. Note it's a list (square brackets) and each category is a string (quotes)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"labels = ['0-50', '51- 99']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79', '80-89', '90+']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Second, we generate a range that matches those breaks."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"range(0, 101,10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make a new column for age group. Use pandas' cut function to make breaks. Attach labels."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data['age_group'] = pd.cut(data.age, range(0, 101, 10), right=False, labels=labels)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See if it worked."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data[['age', 'age_group']][:5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>age_group</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>25</td>\n",
" <td>20-29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30</td>\n",
" <td>30-39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>40</td>\n",
" <td>40-49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>60</td>\n",
" <td>60-69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>29</td>\n",
" <td>20-29</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
" age age_group\n",
"0 25 20-29\n",
"1 30 30-39\n",
"2 40 40-49\n",
"3 60 60-69\n",
"4 29 20-29"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cool, so how many people are in each age group?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"age_group_counts = data.age_group.value_counts(sort=False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"age_group_counts"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
"0-9 9\n",
"10-19 29\n",
"20-29 128\n",
"30-39 209\n",
"40-49 220\n",
"50-59 250\n",
"60-69 198\n",
"70-79 161\n",
"80-89 62\n",
"90+ 11\n",
"dtype: int64"
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This would be better as a plot."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"age_group_counts.plot(kind='bar')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 205,
"text": [
"<matplotlib.axes._subplots.AxesSubplot at 0x125f8b00>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAESCAYAAADtzi4UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFsJJREFUeJzt3X+U5XV93/Hn/mCWbGeYLO0snHC2eLJh3sHjgQBGU7T8\nODGIrUBrk1ST2kIq6JHAEiM2WQ3WHAhWwqZQoTasSjSKOVCMqxRKggQQqkhECcW+F0rdjCdUVnfY\nH6x1dmH7x/c7Ooyz8+POvd/7nc8+H+fsmTvfe+f7fe2dO6/7vZ97P9/vsgMHDiBJWvqW9zuAJKk7\nLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEKsnO3KiDgM+BhwLLAKuBL4NvAFYGt9sxsz89aIuBC4CNgP\nXJmZd/QstSTpxyyb7XPoEXE+cEJmvisi1gDfAD4ADGfmpim3Oxq4GzgF+AngS8ArM3Oih9klSVPM\nuocO3ArcVl9eDuyjKu2IiPOAJ4HLgFcBD2bmPmBfRDwFnAA80pPUkqQfM+sYemY+n5l7ImKIqtzf\nCzwMvDszTweeBt4PDAE7p/zobmC4N5ElSTOZaw+diFgH3A7ckJmfiYjhzJws788C/wm4n6rUJw0B\n47Otd//+Fw6sXLmis9SSdOhadrAr5npT9CiqsfF3Zua99eK7IuLSzPwq8DqqYZWHgasiYhVwOHA8\n8Phs6x4f3zv/+AcxMjLE9u27F72epZ6hLTnakKEtOdqQoS052pChLTm6kWFkZOig1821h76Raujk\nioi4ol52GfBHEbEPeAa4qB6WuR54gGoYZ6NviEpSs2Yt9MzcAGyY4arXznDbzcDmLuWSJC3QnGPo\nkl5qYmKCsbFts95mfHyQHTv2zHqbdeuOZWBgoJvRdIiz0KUFGhvbxoZrtrB6eG3H69i781muu/xc\n1q8/rovJdKiz0KUOrB5ey+CaY/odQ3oJj+UiSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1Ih\nLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJC\nl6RCWOiSVAgLXZIKYaFLUiEsdEkqxMp+B5Dma2JigrGxbXPebnx8kB079hz0+nXrjmVgYKCb0aRW\nsNC1ZIyNbWPDNVtYPby243Xs3fks111+LuvXH9fFZFI7WOhaUlYPr2VwzTH9jiG1kmPoklQIC12S\nCmGhS1IhLHRJKoSFLkmFsNAlqRAWuiQVYtbPoUfEYcDHgGOBVcCVwDeBm4EXgceBizPzQERcCFwE\n7AeuzMw7ephbkjTNXHvovw5sz8zTgLOBG4BrgY31smXAeRFxNHAJcCrweuDqiHButSQ1aK6ZorcC\nt9WXlwP7gJMz8/562Z3AWcALwIOZuQ/YFxFPAScAj3Q/siRpJrMWemY+DxARQ1Tl/j7gD6fcZDcw\nDBwB7JxhuQoxnwNjzXVQLPDAWFIvzXksl4hYB9wO3JCZt0TEh6ZcfQTwHLALGJqyfAgYn229a9as\nZuXKFQtPPM3IyNDcN+qxNmSA3ubYunVrVw6M9cmrf41jjhnt6OfHxwc73vZURx45uKj7qi055qsN\nj882ZIB25OhlhrneFD0KuBt4Z2beWy9+NCJOz8z7gDcA9wAPA1dFxCrgcOB4qjdMD2p8fO9iszMy\nMsT27bsXvZ6lnqGJHDt27OnKgbF27NjTcc659v6byNCmHPPRhsdnGzK0JUc3Msz2hDDXHvpGqqGT\nKyLiinrZBuD6+k3PJ4Db6k+5XA88QDXWvjEzJxaVWpK0IHONoW+gKvDpzpjhtpuBzd2JJWk2nuxD\nM/F46NIS5Mk+NBMLXVqiPNmHpnPqvyQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQ\nFrokFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGh\nS1IhLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSrEyn4HkLR0TUxMMDa2\nbdbbjI8PsmPHnllvs27dsQwMDHQz2iHJQpfUsbGxbWy4Zgurh9d2vI69O5/lusvPZf3647qY7NBk\noUtalNXDaxlcc0y/YwjH0CWpGPPaQ4+IVwMfzMwzI+Ik4PPAk/XVN2bmrRFxIXARsB+4MjPv6Eli\nSdKM5iz0iHgP8K+AyXc1TgE2ZeamKbc5Grikvu4ngC9FxF9k5kT3I0uSZjKfPfSngDcBn6y/PwUY\njYjzqPbSLwNeBTyYmfuAfRHxFHAC8Ej3I0uSZjLnGHpm3k41jDLpK8C7M/N04Gng/cAQsHPKbXYD\nw13MKUmaQydvin42Mx+dvAycBOyiKvVJQ8D4IrNJkhagk48t3hURl2bmV4HXUQ2rPAxcFRGrgMOB\n44HHZ1vJmjWrWblyRQebf6mRkaG5b9RjbcgAvc0xPj7YlfUceeRgxznbkKEtOdqQoU055qsNf6u9\nzLCQQj9Qf30HcENE7AOeAS7KzD0RcT3wANVe/8a53hAdH9/bSd6XGBkZYvv23Ytez1LP0ESOuWb6\nLWQ9neZsQ4a25GhDhjblmI82/K12I8NsTwjzKvTM/BZwan35G8BrZ7jNZmBzRwklSYvmxCJJKoSF\nLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiS\nVIhOTnChhk1MTDA2tm3W24yPD856bOp1645lYGCg29EktYiFvgSMjW1jwzVbWD28tqOf37vzWa67\n/FzWrz+uy8kktYmFvkSsHl7L4Jpj+h1DUos5hi5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFL\nUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1IhLHRJKoSFLkmFsNAlqRAWuiQV\nwkKXpELM6yTREfFq4IOZeWZE/AxwM/Ai8DhwcWYeiIgLgYuA/cCVmXlHjzJLkmYw5x56RLwHuAlY\nVS/aBGzMzNOAZcB5EXE0cAlwKvB64OqIGOhNZEnSTOYz5PIU8Caq8gY4OTPvry/fCbwO+Hngwczc\nl5m76p85odthJUkHN2ehZ+btVMMok5ZNubwbGAaOAHbOsFyS1JB5jaFP8+KUy0cAzwG7gKEpy4eA\n8dlWsmbNalauXNHB5l9qZGRo7hv1WK8zjI8PLnodRx45uKic3ciw2BxtyNCWHG3I0KYc81V6X3RS\n6I9GxOmZeR/wBuAe4GHgqohYBRwOHE/1hulBjY/v7WDTLzUyMsT27bsXvZ62Z9ixY09X1rGYnN3I\nsNgcbcjQlhxtyNCmHPNRSl/M9oSwkEI/UH/9beCm+k3PJ4Db6k+5XA88QDWMszEzJzrMK0nqwLwK\nPTO/RfUJFjLzSeCMGW6zGdjcxWySpAVwYpEkFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYWw\n0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1IhLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtd\nkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYVY2e8A\nkrQYExMTjI1tm/N24+OD7Nix56DXr1t3LAMDA92M1jgLXdKSNja2jQ3XbGH18NqO17F357Ncd/m5\nrF9/XBeTNc9Cl7TkrR5ey+CaY/odo+8cQ5ekQljoklQIC12SCtHxGHpEfA3YWX/7NHA1cDPwIvA4\ncHFmHlhsQEnS/HRU6BFxOEBmnjll2RZgY2beHxH/GTgP+POupJQkzanTPfQTgdUR8d/rdbwXODkz\n76+vvxM4CwtdkhrT6Rj688A1mfl64B3Ap6ZdvwcYXkwwSdLCdLqHvhV4CiAzn4yI7wEnTbl+CHhu\nthWsWbOalStXdLj5HxkZGVr0OtqeYXx8cNHrOPLIwUXl7EaGxeZoQ4a25GhDhrbkaEOGhejlNjot\n9AuAE4CLI+KnqAr87og4PTPvA94A3DPbCsbH93a46R8ZGRli+/bdi15P2zPMNl15IetYTM5uZFhs\njjZkaEuONmRoS442ZJivbvTFbE8InRb6R4GPR8TkmPkFwPeAmyJiAHgCuK3DdUuSOtBRoWfmfuCt\nM1x1xqLSSJI65sQiSSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFLUiE8p+gs5nM28bnO\nJA5lnE1cUvtZ6LPwbOKSlhILfQ6eTVzSUuEYuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5J\nhbDQJakQrZ1Y1I1p9065l3QoaW2hL3bavVPuJR1qWlvo4LR7SVoIx9AlqRAWuiQVwkKXpEJY6JJU\nCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFLUiG6evjciFgO3Aic\nAPwAeFtm/u9ubkOSNLNuHw/9nwEDmXlqRLwauLZeJklF68ZZ1mBxZ1rrdqG/BrgLIDO/EhGv7PL6\nJamVFnuWNVj8mda6XehHALumfP9CRCzPzBe7vB1Jap1+n2Wt24W+Cxia8v2iynzvzmc7DrKYn+3m\netqQow0ZupWjDRnakqMNGdqSow0Z2pBj2YEDBxa1gqki4k3AOZl5QUT8AvB7mflPu7YBSdJBdXsP\n/bPAL0XEg/X3F3R5/ZKkg+jqHrokqX+cWCRJhbDQJakQFrokFcJCl6RCWOiSVIhuf2yxpyLiCODv\nATsy8wd9ynAU8AvAauC7wJczc/ehlqEtOdqQoS052pChLTnakGFanndn5h/2ejtLYg89Ik6MiL8G\ntgLfBr4eEV+MiPUN53gLcA/wFuA/ABcBX46IxiZPtSFDW3K0IUNbcrQhQ1tytCHDDBrZ9pIodOB6\n4C2ZeTTVAcA+B7wHuKnhHJcAp2Tmm4GTgH3AqcDvHWIZ2pKjDRnakqMNGdqSow0ZfigibgFeHhG3\nRMSne7mtpVLoh2XmVoDM/DLwmsx8BDi84RyHA5MzsX4A/MPM3Emz92MbMrQlRxsytCVHGzK0JUcb\nMkz1X4C/Az5SX+6ZpTKG/lREfITq0LxvBL4aEW8Enm84x58CD0fEXwGnAR+OiMuAvz7EMrQlRxsy\ntCVHGzK0JUcbMvxQZv5VRDyXmff1eltLYup/RAwAFwIvB74OfAx4FZCZuaPhLK8Ajgf+JjP/V0T8\ng8z87qGWoS056gwvBx7zvuh/hrbkaEOGaXlGJ0cZemlJFPp0EfG7mXl1v3P0Q0SsBf4d8H3gjzLz\ne/Xy92fmBxrMsQI4B3gOeAzYBLwAbMzM7zSVY1qmTZn5rj5s91cy89aIGATeTzVu+whwZWbOfnqa\n7mV4GfAK4ItUj49XAo8Df1APNzSiHiP+rX49BuoMy6jehJwA7qM6c9pPUj02/7bhHOcBrwOGqf5W\n7gduy8yeFO9SGXKZ7iyg8UKPiLdTjc0tm3bVgcz844ZifAK4HTgMeCAi/klmfgs4A2is0IHN9dej\ngb9PNTa4p15+ThMBIuKh+uLk7+PlEfGPqH4fpzaRofZO4FbgPwJPA5cCvwj8MfBrDWX4BHAFcB3w\nt8D7qIYbPk1Dn7ConQrcFRHXAzf3qrjmsBlYRXVuhg8AnwSeofoQxesbzHED1WPzTmA31QmA3lBn\neFsvNrjkCr3eM+yXn6Uqq0/2McOqySePiHgU+FxEnNGHHMdl5mvr4bD/mZkfrTO9vcEMHwZ+A7iM\n6snkFuDN/PgTblOOy8zJP9Qn6vMDNOVAPVb73sy8sF72aET8aoMZAP4P8M+B3wd+OyI+RVVoT2fm\nrll/sntGM/Mf13vIT2TmjQARsaGh7U96RWaeNm3Z56bsiHTdkij0+vPm11K9jHwBWB0Rd1C9tOv5\nuNSkzPytiPhZ4M7MfLip7U6zIiJOyMzHMvOhiPgDqo9xDjYdJCJem5lfiohfrL//GaCzs9t2IDM/\nHRHfBD4EvAv4f5k5+1l6e+O4iHgXsD8iTsrMRyPi56leRTXluYj4ZeC/RcS/AbZQ7Zk3/cEBMvM5\n4NJ6ePCXqT4uGFRDQk1YFhFnU71yHImI46n2kJv+VNzyiDgtM++fXBARp1MNBfXEkih0qpdQv5OZ\nX5lcUJ8R6eNUn0tv0r+mmq3aL5cC10fEv8zM72Tmn0XEYVQvtZt0EXBVRDw0ZVxyE3B5kyHq8nwr\n8FFgpMltT3EOcAqQwIkR8TTV3Il3NJjhQqontlOBlwEfBL5Ej17az+L/Tl7IzGeBG+t/TXob8O+p\n3mc6C/ivVDNGm74vzgeurXc+VwM7gIeofle9ceDAgdb/Gx0dfeggyx/sc66TW3DfnFR/XXEo3xej\no6MrRkdHX9Xv30cb7ou2ZKhznNSCDH25L0ZHR08cHR392ujo6HdGR0dfHB0d/ebo6Oi9o6Oj63u1\nzaUyseixiPhYRPxqRJwdEb8SER+n+nRFP13b5+1DtVdMZr7Q5xx9vS/q//8H+5lhijY8Lnp+3JB5\n2tTvAPTvvrgeeHNmHkX1yulzVK9gezbDfakU+juBLwCvBv4F1UF3Pl8vlyb1683QNvK++JF+3ReN\nz3BfEmPomfki1Uf1bu93lmk+3O8AtCMDtCNHGzJAO3K0IQO0I0e/MjQ+w31JTizql35MFGhjhrbk\naEOGtuRoQ4a25GhDhjpH4zPcl8QeeotMnSiwh2riQk8nCrQ0Q1tytCFDW3K0IUNbcrQhA5k5UWeZ\n6n/0cpsW+sI0PlGgpRnakqMNGdqSow0Z2pKjDRn6Yqm8KdoWyyPiJQ+UXk8UaGmGtuRoQ4a25GhD\nhrbkaEOGvnAPfWHOBzbVByBaBrwIPEovJwrMnOHaekr18ikZLm0ww9Qck/fF4VQHpGr6vpj6+xih\negOqyQzTcwAcBdzdcI7zeenvY5jqrD1N3xcbgCta8jcymeGwPmToCwt9ATLzKeDcPsc4nupofhPA\n+zLzFoCIuBc4s8Ecq+qvfwl8imo273FUx7t5sqEMy/nRzNRlwJ/U3zf9MbXfzcxz69nLn6Iqj5+m\nmnre1H3xGqo33n6f6oBc36V6M+5lDWaAanbqJZl5XoPbnG4FsB94gOqz4J8ARqlm8zZ5XzTOQl+A\nujRXMfPRFps6ut/7gBOpyuzWiFiVmTc3tO2pPlJneRnVkQaDaqr1XVRzBJpwD9VHwJ6pvx+tc0Gz\nT24/XX+9Cjg7M5+MiJ8CPkN1xMMm/CbVETc/D5yTmVvrDFuAv2goA8A3gJ+LiC8CH2jipA4zuInq\niW2Y6v74OWCc6vHymT7kaYxj6AvzO1QHwXor1QloJ/81dYhUgB9k5nh9HPTzgIsjosnymrQsM+/L\nzD8B/rw+rswuqvM3NuUU4JvA1Zl5JvD1zDyzvtwP+zPzSYDM/LuGt70vM58HdlEdwncyw4sN5/h+\nZv4m1Tl/N0TE4xFxXUQ0OSS4PDP/kmreyvcy89v1fdPkY7Mv3ENfgMz8SkT8KXBCZvZrktO2iNgE\nXJGZu+tDtN5NtTfSpK0RsRl4e2aeD9WJR5hycKZey8xn68PDXlMf3bBfMwKHI+JrVEcB/bdUwy7X\nAk0e+fHzEbEF+BvgCxFxN3A2cG+DGX6onhH5poj4SapXKaMNbn5rRPwZ1d/EMxFxFdUTXd/OWNQU\nC32BMvNDfY7wG8CvU58ENzPH6uOhb2w4x4XAG6cdQ+bbVCd5aExm7gMui4jz6VOhZ+bJEXE41VDY\n81R7xY9RHQGyqQxX14+Ds4AxqjeIr8vMO5rKULt5Wq7nqIZ9mvQ2qvvh+8CDVK8WjgAuaDhH45wp\nKkmFcAxdkgphoUtSISx0SSqEhS5JhbDQJakQ/x/SY7wXWRRKiwAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x125f8cf8>"
]
}
],
"prompt_number": 205
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age_group.value_counts(sort=False).plot(kind='bar')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"<matplotlib.axes._subplots.AxesSubplot at 0xbc30860>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAESCAYAAADtzi4UAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFrZJREFUeJzt3X+U5XV93/Hn/mCWTGeY7LazcMLZ4smGeQePBwIYTdHy\n48QgtgKtTVJNagupoEcCS4xYXQ3WHAhWwqZQoTasSjSKOVCMKIWSIAGEKhJRQrHvhVI34wmV1R2W\nXdY6u7D94/sdHYbZmdk7937vdz77fJzDmTv33v1+X3xn5nW/93O/n+932b59+5AkLX3L+x1AktQd\nFrokFcJCl6RCWOiSVAgLXZIKYaFLUiFWzvVgRBwCfAI4ClgFXAZ8F/gSsKV+2nWZeVNEnAecD+wF\nLsvM23qWWpL0EsvmOg49Is4Bjs3Md0XEauBbwIeAkczcNO15RwB3AicCPwV8BXhlZk72MLskaZo5\n99CBm4Cb69vLgT1UpR0RcTbwOHAx8Crg/szcA+yJiCeAY4GHepJakvQSc46hZ+ZzmbkrIoapyv39\nwIPAuzPzFOBJ4IPAMLBj2j/dCYz0JrIkaTbz7aETEeuAW4BrM/NzETGSmVPl/XngPwH3UpX6lGFg\nYq7l7t37/L6VK1d0llrqoy1btvDW932WwZG1HS9j946n+fQVv8HY2FgXk+kgsWx/D8z3oejhVGPj\n78zMu+u774iIizLz68DrqIZVHgQuj4hVwKHAMcCjcy17YmL3wuPvx+joMNu27Vz0cpZ6hrbkaEOG\nJnJs376LwZG1DK0+ctHL6fX2asPPpA0Z2pKjGxlGR4f3+9h8e+gbqYZOLo2IS+v7Lgb+KCL2AE8B\n59fDMtcA91EN42z0A1FJatachZ6ZG4ANszz02lmeuxnY3KVckqQD5MQiSSqEhS5JhbDQJakQFrok\nFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1Ih\nLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUiJX9DiAt1OTkJOPjW+d93sTEENu379rv4+vWHcXA\nwEA3o0mtYKFryRgf38qGK29lcGRtx8vYveNprr7kLNavP7qLyaR2sNC1pAyOrGVo9ZH9jiG1kmPo\nklQIC12SCmGhS1IhLHRJKoSFLkmFsNAlqRAWuiQVwuPQtSALmaU53wxNcJam1EsWuhbEWZpS+1no\nWjBnaUrt5hi6JBXCQpekQsw55BIRhwCfAI4CVgGXAd8GbgBeAB4FLsjMfRFxHnA+sBe4LDNv62Fu\nSdIM8+2h/yawLTNPBs4ArgWuAjbW9y0Dzo6II4ALgZOA1wNXRISHMkhSg+b7UPQm4Ob69nJgD3BC\nZt5b33c7cDrwPHB/Zu4B9kTEE8CxwEPdjyxJms2chZ6ZzwFExDBVuX8A+MNpT9kJjACHATtmuV+S\n1JB5D1uMiHXALcC1mXljRHxk2sOHAc8AzwLD0+4fBibmWu7q1YOsXLniwBPPMDo6PP+TeqwNGaC3\nOSYmhrqynDVrhjrO2YYMbcqxUG34/WxDBmhHjl5mmO9D0cOBO4F3Zubd9d0PR8QpmXkP8AbgLuBB\n4PKIWAUcChxD9YHpfk1M7F5sdkZHh9m2beeil7PUMzSRY74ZoAeynE5ztiFDW3Is9Pqqa9b0//qq\nB8vfSFMZ5npBmG8PfSPV0MmlEXFpfd8G4Jr6Q8/HgJvro1yuAe6jGmvfmJmTi0otab+cuavZzDeG\nvoGqwGc6dZbnbgY2dyeWpPk4c1czObFIkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiS\nVAgLXZIKYaFLUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1IhLHRJKoSFLkmF\nsNAlqRAWuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFWJlvwNIWromJycZ\nH98653MmJobYvn3XnM9Zt+4oBgYGuhntoGShS+rY+PhWNlx5K4Mjaztexu4dT3P1JWexfv3RXUx2\ncLLQJS3K4MhahlYf2e8YwjF0SSqGhS5JhbDQJakQCxpDj4hXAx/OzNMi4njgi8Dj9cPXZeZNEXEe\ncD6wF7gsM2/rSWJJ0qzmLfSIeA/wr4Cp445OBDZl5qZpzzkCuLB+7KeAr0TEX2TmZPcjS5Jms5A9\n9CeANwGfrr8/ERiLiLOp9tIvBl4F3J+Ze4A9EfEEcCzwUPcjS5JmM+8YembeQjWMMuVrwLsz8xTg\nSeCDwDCwY9pzdgIjXcwpSZpHJx+Kfj4zH566DRwPPEtV6lOGgYlFZpMkHYBOJhbdEREXZebXgddR\nDas8CFweEauAQ4FjgEfnWsjq1YOsXLmig9W/2Ojo8PxP6rE2ZIDe5piYGOrKctasGeo4ZxsytCVH\nGzK0KcdCteFvtZcZDqTQ99Vf3wFcGxF7gKeA8zNzV0RcA9xHtde/cb4PRCcmdneS90VGR4fZtm3n\nopez1DM0kWO+c3EcyHI6zdmGDG3J0YYMbcqxEG34W+1GhrleEBZU6Jn5HeCk+va3gNfO8pzNwOaO\nEkqSFs2JRZJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFLUiEs\ndEkqhIUuSYWw0CWpEBa6JBWikysWqWGTk5OMj2+d8zkTE0NzXmxg3bqjGBgY6HY0SS1ioS8B4+Nb\n2XDlrQyOrO3o3+/e8TRXX3IW69cf3eVkktrEQl8iBkfWMrT6yH7HkNRijqFLUiEsdEkqhIUuSYWw\n0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1IhLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtd\nkgphoUtSISx0SSqEhS5JhVjQNUUj4tXAhzPztIj4OeAG4AXgUeCCzNwXEecB5wN7gcsy87YeZZYk\nzWLePfSIeA9wPbCqvmsTsDEzTwaWAWdHxBHAhcBJwOuBKyJioDeRJUmzWciQyxPAm6jKG+CEzLy3\nvn078DrgF4H7M3NPZj5b/5tjux1WkrR/8xZ6Zt5CNYwyZdm02zuBEeAwYMcs90uSGrKgMfQZXph2\n+zDgGeBZYHja/cPAxFwLWb16kJUrV3Sw+hcbHR2e/0k91usMExNDi17GmjVDi8rZjQyLzdGGDG3J\n0YYMbcqxUKX3RSeF/nBEnJKZ9wBvAO4CHgQuj4hVwKHAMVQfmO7XxMTuDlb9YqOjw2zbtnPRy2l7\nhu3bd3VlGYvJ2Y0Mi83RhgxtydGGDG3KsRCl9MVcLwgHUuj76q+/C1xff+j5GHBzfZTLNcB9VMM4\nGzNzssO8kqQOLKjQM/M7VEewkJmPA6fO8pzNwOYuZpMkHQAnFklSISx0SSqEhS5JhbDQJakQFrok\nFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklQIC12SCmGhS1Ih\nLHRJKoSFLkmFsNAlqRAWuiQVwkKXpEJY6JJUCAtdkgphoUtSISx0SSqEhS5JhbDQJakQFrokFWJl\nvwNI0mJMTk4yPr513udNTAyxffuu/T6+bt1RDAwMdDNa4yx0SUva+PhWNlx5K4Mjaztexu4dT3P1\nJWexfv3RXUzWPAtd0pI3OLKWodVH9jtG3zmGLkmFsNAlqRAWuiQVwkKXpEJY6JJUiI6PcomIbwA7\n6m+fBK4AbgBeAB4FLsjMfYsNKElamI4KPSIOBcjM06bddyuwMTPvjYj/DJwN/HlXUkqS5tXpHvpx\nwGBE/Pd6Ge8HTsjMe+vHbwdOx0KXpMZ0Oob+HHBlZr4eeAfwmRmP7wJGFhNMknRgOt1D3wI8AZCZ\nj0fED4Djpz0+DDwz1wJWrx5k5coVHa7+J0ZHhxe9jLZnmJgYWvQy1qwZWlTObmRYbI42ZGhLjjZk\naEuONmQ4EL1cR6eFfi5wLHBBRPwMVYHfGRGnZOY9wBuAu+ZawMTE7g5X/ROjo8Ns27Zz0ctpe4a5\nTih0IMtYTM5uZFhsjjZkaEuONmRoS442ZFiobvTFXC8InRb6x4FPRsTUmPm5wA+A6yNiAHgMuLnD\nZUuSOtBRoWfmXuCtszx06qLSSJI65sQiSSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFL\nUiEsdEkqhIUuSYWw0CWpEBa6JBXCQpekQljoklSITs+HflCYnJxkfHzrnM+ZmBia9wT769YdxcDA\nQDejSdJLWOhzGB/fyoYrb2VwZG3Hy9i942muvuQs1q8/uovJJOmlLPR5DI6sZWj1kf2OIUnzcgxd\nkgphoUtSISx0SSqEhS5JhbDQJakQFrokFcJCl6RCWOiSVAgLXZIKYaFLUiEsdEkqhIUuSYWw0CWp\nEJ5tUZK6oA3XT7DQJakL2nD9hNYWejde7bxSkKQm9fv6Ca0t9MW+2nmlIEkHm9YWOvT/1U6SlhKP\ncpGkQnR1Dz0ilgPXAccCPwLelpn/u5vrkCTNrtt76P8MGMjMk4D3Ald1efmSpP3odqG/BrgDIDO/\nBryyy8uXJO1Htwv9MODZad8/Xw/DSJJ6rNtHuTwLDE/7fnlmvtDpwnbveLrjIIv5t91cThtytCFD\nt3K0IUNbcrQhQ1tytCFDG3Is27dv36IWMF1EvAk4MzPPjYhfAn4vM/9p11YgSdqvbu+hfx74lYi4\nv/7+3C4vX5K0H13dQ5ck9Y8fWEpSISx0SSqEhS5JhbDQJakQFrokFaLVp8+dKSIOA/4esD0zf9Sn\nDIcDvwQMAt8HvpqZOw+2DG3J0YYMbcnRhgxtydGGDDPyvDsz/7DX61kSe+gRcVxE/DWwBfgu8M2I\n+HJErG84x1uAu4C3AP8BOB/4akQ0NnmqDRnakqMNGdqSow0Z2pKjDRlm0ci6l0ShA9cAb8nMI6hO\nAPYF4D3A9Q3nuBA4MTPfDBwP7AFOAn7vIMvQlhxtyNCWHG3I0JYcbcjwYxFxI/DyiLgxIj7by3Ut\nlUI/JDO3AGTmV4HXZOZDwKEN5zgUmJqJ9SPgH2bmDprdjm3I0JYcbcjQlhxtyNCWHG3IMN1/Af4O\n+Fh9u2eWyhj6ExHxMapT874R+HpEvBF4ruEcfwo8GBF/BZwMfDQiLgb++iDL0JYcbcjQlhxtyNCW\nHG3I8GOZ+VcR8Uxm3tPrdS2Jqf8RMQCcB7wc+CbwCeBVQGbm9oazvAI4BvibzPxfEfEPMvP7B1uG\ntuSoM7wceMRt0f8MbcnRhgwz8oxNjTL00pIo9Jki4n2ZeUW/c/RDRKwF/h3wQ+CPMvMH9f0fzMwP\nNZhjBXAm8AzwCLAJeB7YmJnfayrHjEybMvNdfVjvr2XmTRExBHyQatz2IeCyzNzVUIaXAa8Avkz1\n+/FK4FHgD+rhhkbUY8S/06/fgTrDMqoPISeBe6iunPbTVL+bf9twjrOB1wEjVH8r9wI3Z2ZPinep\nDLnMdDrQeKFHxNupxuaWzXhoX2b+cUMxPgXcAhwC3BcR/yQzvwOcCjRW6MDm+usRwN+nGhvcVd9/\nZhMBIuKB+ubUz+PlEfGPqH4eJzWRofZO4CbgPwJPAhcBvwz8MfAbDWX4FHApcDXwt8AHqIYbPktD\nR1jUTgLuiIhrgBt6VVzz2Aysoro2w4eATwNPUR1E8foGc1xL9bt5O7CT6gJAb6gzvK0XK1xyhV7v\nGfbLz1OV1af7mGHV1ItHRDwMfCEiTu1DjqMz87X1cNj/zMyP15ne3mCGjwK/BVxM9WJyI/BmXvqC\n25SjM3PqD/Wx+voATdlXj9W+PzPPq+97OCJ+vcEMAP8H+OfA7wO/GxGfoSq0JzPz2Tn/ZfeMZeY/\nrveQH8vM6wAiYkND65/yisw8ecZ9X5i2I9J1S6LQ6+PNr6J6G/k8MBgRt1G9tev5uNSUzPydiPh5\n4PbMfLCp9c6wIiKOzcxHMvOBiPgDqsM4h5oOEhGvzcyvRMQv19//HDDQ1Poz87MR8W3gI8C7gP+X\nmVubWv80R0fEu4C9EXF8Zj4cEb9I9S6qKc9ExK8C/y0i/g1wK9WeedMHDpCZzwAX1cODv0p1uGBQ\nDQk1YVlEnEH1znE0Io6h2kNu+qi45RFxcmbeO3VHRJxCNRTUE0ui0KneQr23vvA0APUVkT5JdVx6\nk/411WzVfrkIuCYi/mVmfi8z/ywiDqF6q92k84HLI+KBaeOSm4BLmgxRl+dbgY8Do02ue5ozgROB\nBI6LiCep5k68o8EM51G9sJ0EvAz4MPAVevTWfg7/d+pGZj4NXFf/16S3Af+e6nOm04H/SjVjtOlt\ncQ5wVb3zOQhsBx6g+ln1xr59+1r/39jY2AP7uf/+Puc6oQXb5vj664qDeVuMjY2tGBsbe1W/fx5t\n2BZtyVDnOL4FGfqyLcbGxo4bGxv7xtjY2PfGxsZeGBsb+/bY2NjdY2Nj63u1zqUyseiRiPhERPx6\nRJwREb8WEZ+kOrqin67q8/qh2ismM5/vc46+bov6///D/cwwTRt+L3p+3pAF2tTvAPRvW1wDvDkz\nD6d65/QFqnewPZvhvlQK/Z3Al4BXA/+C6qQ7X6zvl6b068PQNnJb/ES/tkXjM9yXxBh6Zr5Adaje\nLf3OMsNH+x2AdmSAduRoQwZoR442ZIB25OhXhsZnuC/JiUX90o+JAm3M0JYcbcjQlhxtyNCWHG3I\nUOdofIb7kthDb5HpEwV2UU1c6OlEgZZmaEuONmRoS442ZGhLjjZkIDMn6yzT/Y9ertNCPzCNTxRo\naYa25GhDhrbkaEOGtuRoQ4a+WCofirbF8oh40S9KrycKtDRDW3K0IUNbcrQhQ1tytCFDX7iHfmDO\nATbVJyBaBrwAPEwvJwrMnuGqekr18mkZLmoww/QcU9viUKoTUjW9Lab/PEapPoBqMsPMHACHA3c2\nnOMcXvzzGKG6ak/T22IDcGlL/kamMhzShwx9YaEfgMx8AjirzzGOoTqb3yTwgcy8ESAi7gZOazDH\nqvrrXwKfoZrNezTV+W4ebyjDcn4yM3UZ8Cf1900fpva+zDyrnr38Gary+FmqqedNbYvXUH3w9vtU\nJ+T6PtWHcS9rMANUs1MvzMyzG1znTCuAvcB9VMeCfwoYo5rN2+S2aJyFfgDq0lzF7GdbbOrsfh8A\njqMqs5siYlVm3tDQuqf7WJ3lZVRnGgyqqdZ3UM0RaMJdVIeAPVV/P1bngmZf3H62/no5cEZmPh4R\nPwN8juqMh034baozbn4RODMzt9QZbgX+oqEMAN8CfiEivgx8qImLOszieqoXthGq7fELwATV78vn\n+pCnMY6hH5j3Up0E661UF6Cd+q+pU6QC/CgzJ+rzoJ8NXBARTZbXlGWZeU9m/gnw5/V5ZZ6lun5j\nU04Evg1ckZmnAd/MzNPq2/2wNzMfB8jMv2t43Xsy8zngWapT+E5leKHhHD/MzN+muubvhoh4NCKu\njogmhwSXZ+ZfUs1b+UFmfrfeNk3+bvaFe+gHIDO/FhF/Chybmf2a5LQ1IjYBl2bmzvoUrXdS7Y00\naUtEbAbenpnnQHXhEaadnKnXMvPp+vSwV9ZnN+zXjMCRiPgG1VlA/y3VsMtVQJNnfvxiRNwK/A3w\npYi4EzgDuLvBDD9Wz4h8U0T8NNW7lLEGV78lIv6M6m/iqYi4nOqFrm9XLGqKhX6AMvMjfY7wW8Bv\nUl8ENzPH6/Ohb2w4x3nAG2ecQ+a7VBd5aExm7gEujohz6FOhZ+YJEXEo1VDYc1R7xY9QnQGyqQxX\n1L8HpwPjVB8QX52ZtzWVoXbDjFzPUA37NOltVNvhh8D9VO8WDgPObThH45wpKkmFcAxdkgphoUtS\nISx0SSqEhS5JhbDQJakQ/x9rK6+739WZ7AAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0xa257588>"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or a boxplot could be useful."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age.plot(kind='box', )"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 209,
"text": [
"<matplotlib.axes._subplots.AxesSubplot at 0x12aa8160>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAECCAYAAAAMxDf2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADLlJREFUeJzt3W+MXNdZx/Hv7Gxsy+3EWcEEKKkqhainlVBVhYbGBtkO\nNSkWtRwqURUhVKJSgmxFQURuqQmVBQYiTPKibVwhR8WIF1TFlAirMorkRnZqU1KVSMV18vRP1KoV\niGxhjdc1dprd5cWM0Whjz+7cnZmdfeb7kSzPzJ055+wL//zsufc+U1tYWECStLZNrPYCJEkrZ5hL\nUgKGuSQlYJhLUgKGuSQlYJhLUgKTS72hlPJO4NGIuKeUcgdwFJgHzgF7I2KhlPIh4LeBV4GDEfH5\nAa5ZkrRI18q8lPJh4Aiwvv3S48D+iNgK1IDdpZQfBx4EtgDvBv6slLJucEuWJC221DbLN4H30gpu\ngDsj4nT78QlgB3AXcCYifhgRF9ufedsgFitJur6uYR4Rn6O1dXJNrePxLLAJuBn4n+u8Lkkakl5P\ngM53PL4ZuABcBBodrzeAmRWuS5LUgyVPgC7yfCllW0ScAnYCJ4HngD8ppawHNgBvpXVy9IZefXVu\nYXKyXmW9kjTOajc6sNwwv9aN62HgSPsE53ngWPtqlo8Dz9Kq9PdHxCvdBpuZubzMaaXqDhx4hOPH\nn+r5cxMTNebnl9+Abteu+zhw4GDP80i9ajYbNzy2ZJhHxLdpXalCRHwD2H6d9zwJPFl1gdKo+O+L\nV6jVakw11i/9ZmmE1FajBe709Kx9dzWS9h0+S71e49EHNq/2UqTXaDYbN9xm8Q5QSUrAMJekBNxm\nkRZpNhtMT8+u9jKk13CbRZKSM8wlKQHDXJISMMwlKQHDXJIS6LU3i5SaNw1prbIyl6QEDHNJSsAw\nl6QEDHNJSsAwl6QE7M0iLWJvFo0qe7NIUnKGuSQlYJhLUgKGuSQlYJhLUgL2ZpE62JtFa5WVuSQl\nYJhLUgKGuSQlYJhLUgKGuSQlYG8WaRF7s2hU2ZtFkpIzzCUpAcNckhIwzCUpAcNckhKwN4vUwd4s\nWquszCUpAcNckhIwzCUpgZ73zEspE8CTwJuBeeBDwBxwtP38HLA3IrzLU5KGpEplfi/wuoj4eeCP\ngD8FHgP2R8RWoAbs7t8SJUlLqXI1y/8Cm0opNWAT8Arwzog43T5+glbgP9WfJUrDc2jPFnuzaE2q\nEuZngA3Ai8CPALuArR3HL9EKeUnSkFQJ8w8DZyLiD0optwHPADd1HG8AF7oNMDW1kcnJeoWppeFo\nNhurvQSpJ1XC/HXAxfbjmfYYz5dStkXEKWAncLLbADMzlytMKw2H2ywaVd2KjCphfgj4q1LKs7Qq\n8o8CXwGOlFLWAeeBYxXGlSRV5JdTSItYmWtUdftyCnuzSB3szaK1yjtAJSkBw1ySEjDMJSkBw1yS\nEjDMJSkBL02UFvHSRI2qbpcmWplLUgKGuSQlYJhLUgKGuSQlYJhLUgL2ZpE62JtFa5WVuSQlYJhL\nUgKGuSQlYJhLUgKGuSQlYG8WaRF7s2hU2ZtFkpIzzCUpAcNckhIwzCUpAcNckhKwN4vUwd4sWqus\nzCUpAcNckhIwzCUpAcNckhLwdn6tGZ/9wjf58osvD3SOmdkrUKsx9fr1A53nrrfcyvt+4Y6BzqF8\nvJ1fKXz5xZeZmb060DmmGhv40U0bBjrHzOzVgf+npPHjpYlaU6Ya6zm0Z8tA5xh0o619h88ObGyN\nLytzSUrAMJekBAxzSUrAMJekBCqdAC2lfBTYBdwEfBI4AxwF5oFzwN6I8PJDSRqSnivzUsp2YHNE\nbAG2A7cDjwH7I2IrUAN293GNkqQlVNlmuRf4t1LKU8Bx4B+Bn4mI0+3jJ4AdfVqfJGkZqmyzNIE3\nAu+hVZUfp1WNX3MJ2LTypUmSlqtKmH8feCEiXgW+Xkq5Avxkx/EGcKEfi5MkLU+VMP8i8BDweCnl\nDcBG4GQpZVtEnAJ2Aie7DTA1tZHJyXqFqTXO6vXWL4DNZmPgcw1yjmH+HBofPYd5RHy+lLK1lPIc\nrT33PcC3gSOllHXAeeBYtzFmZi5XWKrG3dxc6wKpQd5qD4O/nX9YP4fy6VYAVLo0MSI+cp2Xt1cZ\nS5K0ct40JEkJGOaSlIBhLkkJGOaSlIBhLkkJGOaSlIBhLkkJGOaSlIBhLkkJGOaSlIBhLkkJGOaS\nlIBhLkkJGOaSlIBhLkkJGOaSlEClL6eQVsPPfu9L3H7hJV76yN8PdJ7v1CeYm5sf2Pjvn73KS7fc\nDmwZ2BwaP1bmkpSAlbnWjOduu5vnbrubQ3sGW9EO+jtA9x0+C8CvDmwGjSMrc0lKwDCXpAQMc0lK\nwDCXpAQMc0lKwDCXpAQMc0lKwDCXpAQMc0lKwDCXpAQMc0lKwDCXpAQMc0lKwDCXpAQMc0lKwDCX\npAQMc0lKwDCXpAQqf21cKeVW4CvAu4B54Gj773PA3ohY6McCJUlLq1SZl1JuAv4S+AFQAx4H9kfE\n1vbz3X1boSRpSVW3WQ4BnwL+o/38zog43X58Atix0oVJkpav5zAvpfwmMB0RT7dfqrX/XHMJ2LTy\npUmSlqvKnvn9wEIpZQfwduCvgWbH8QZwoQ9rkyQtU89hHhHbrj0upTwD/A5wqJSyLSJOATuBk93G\nmJrayORkvdepNebq9dYvgM1mY+BzDXKOYf4cGh+Vr2bpsAA8DBwppawDzgPHun1gZuZyH6bVuJmb\na10gNT09O9B5ms3GQOcY1s+hfLoVACsK84i4p+Pp9pWMJUmqzpuGJCkBw1ySEujHnrk0FP918QoA\n+w6fHeg89Xrt//e1B2Fm9ipTjfUDG1/jyTCXOszMXoFajanXDy5spxrruesttw5sfI2n2sLC8Fuo\nTE/P2rdFI2nf4bPU6zUefWDzai9Feo1ms1G70TH3zCUpAcNckhIwzCUpAcNckhLwBKi0yKBv55eq\n8gSoJCVnmEtSAoa5JCVgmEtSAoa5JCVgbxapg7fza62yMpekBAxzSUrAMJekBAxzSUrAMJekBOzN\nIi1ibxaNKnuzSFJyhrkkJWCYS1IChrkkJWCYS1IC9maROtibRWuVlbkkJWCYS1IChrkkJWCYS1IC\nhrkkJWBvFmkRe7NoVNmbRZKSM8wlKQHDXJIS6PkO0FLKTcCngTcB64GDwAvAUWAeOAfsjQj3xSVp\nSKpU5r8OTEfEVuCXgCeAx4D97ddqwO7+LVGStJQqYf53wMc6Pv9D4M6ION1+7QSwow9rk4Zu3+Gz\nfPDg06u9DKlnPW+zRMQPAEopDVrB/gjwFx1vuQRs6svqJEnLUqlrYinljcDngCci4m9LKX/ecbgB\nXOj2+ampjUxO1qtMLQ1Uvd66jLfZbKzySqTeVDkB+mPA08CeiHim/fLzpZRtEXEK2Amc7DbGzMzl\nnhcqDcPc3AL1es2bhjSSuhUZVSrz/bS2UT5WSrm2d/4Q8PFSyjrgPHCswriSpIqq7Jk/RCu8F9u+\n4tVIkiqxN4u0iL1ZNKrszSJJyRnmkpSAYS5JCRjmkpSAYS5JCVS6A1TKat/hs9TrNR59YPNqL0Xq\niZW5JCVgmEtSAoa5JCVgmEtSAoa5JCVgbxZpEXuzaFTZm0WSkjPMJSkBw1ySEjDMJSkBw1ySErA3\ni9TB3ixaq6zMJSkBw1ySEjDMJSkBw1ySEjDMJSkBe7NIi9ibRaPK3iySlJxhLkkJGOaSlIBhLkkJ\nGOaSlIC9WaQO9mbRWmVlLkkJGOaSlIBhLkkJGOaSlIBhLkkJ9K03SyllAjgMvA24CvxWRHzreu+1\nN4tGmb1ZNKqG1ZvlPmBdRGwBfh94rI9jS5K66GeY/xzwTwAR8S/AO/o4tiSpi36G+c3AxY7nc+2t\nF0nSgPXzDtCLQKPj+UREzPdxfKknBw48wvHjT/X8uYmJGvPzyz+ts2vXfRw4cLDneaR+6ucJ0PcC\nuyLi/lLK3cAfRsQv92VwSVJX/azM/wH4xVLKmfbz+/s4tiSpi1X52jhJUn95glKSEjDMJSkBw1yS\nEjDMJSkBw1ySEjDMJSkBvwNUY6eUcjNwBLgFeAPwBPCvwCeBWeBl4Er7BrgHgV8DFoDPRMQnVmfV\nUndW5hpHP0UrmN8N3As8DHwK+EBEvAv4FrBQSnkr8D5aTeS2AveVUt68SmuWurIy1zh6GfjddguK\ni7T+HfxERLzQPv4s8H7gp4E3AV9ov34LcAfw9eEuV1qalbnG0e8B/xwRvwEco/Xv4LvtShxgc/vv\nAL4WEfdExD3A3wBfHfpqpWWwMtc4Og58opTyK8DXaFXnDwKfLqVcAl4BvhcRXy2lnCylfBHYAHwJ\n+PfVWrTUjb1ZJKCUsgf4bER8v5Tyx8DViLCvrdYMK3Op5T+Bp9uV+QXgA6u8HqknVuaSlIAnQCUp\nAcNckhIwzCUpAcNckhIwzCUpAcNckhL4P+G9AciMDcHtAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x129437f0>"
]
}
],
"prompt_number": 209
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can do column-wise math...addition, subtraction, etc."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age/100"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
"0 0.25\n",
"1 0.30\n",
"2 0.40\n",
"3 0.60\n",
"4 0.29\n",
"5 0.33\n",
"6 0.28\n",
"7 0.45\n",
"8 0.46\n",
"9 0.25\n",
"10 0.53\n",
"11 0.28\n",
"12 0.60\n",
"13 0.60\n",
"14 0.49\n",
"15 0.45\n",
"16 0.45\n",
"17 0.70\n",
"18 0.31\n",
"19 0.39\n",
"20 0.16\n",
"21 0.60\n",
"22 0.39\n",
"23 0.30\n",
"24 0.61\n",
"25 0.69\n",
"26 0.51\n",
"27 0.39\n",
"28 0.40\n",
"29 0.73\n",
" ... \n",
"1261 0.77\n",
"1262 0.60\n",
"1263 0.61\n",
"1264 0.70\n",
"1265 0.75\n",
"1266 0.74\n",
"1267 0.51\n",
"1268 0.50\n",
"1269 0.30\n",
"1270 0.54\n",
"1271 0.41\n",
"1272 0.11\n",
"1273 0.54\n",
"1274 0.26\n",
"1275 0.27\n",
"1276 0.40\n",
"1277 0.65\n",
"1278 0.56\n",
"1279 0.60\n",
"1280 0.76\n",
"1281 0.24\n",
"1282 0.35\n",
"1283 0.24\n",
"1284 0.25\n",
"1285 0.50\n",
"1286 0.36\n",
"1287 0.77\n",
"1288 0.60\n",
"1289 0.93\n",
"1290 0.32\n",
"Name: age, dtype: float64"
]
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are lots of ways to do fancy pivot tables and aggregations. You don't really need to memorize it, just remember that this is a possibility."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"country_table = data.groupby('country').agg({'age': [np.min, np.mean, np.median], 'HCW':np.count_nonzero})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"country_table"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"3\" halign=\"left\">age</th>\n",
" <th>HCW</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th>amin</th>\n",
" <th>mean</th>\n",
" <th>median</th>\n",
" <th>count_nonzero</th>\n",
" </tr>\n",
" <tr>\n",
" <th>country</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>France</th>\n",
" <td>51</td>\n",
" <td>51.000000</td>\n",
" <td>51.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Iran</th>\n",
" <td>35</td>\n",
" <td>51.500000</td>\n",
" <td>51.0</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Italy</th>\n",
" <td>2</td>\n",
" <td>22.000000</td>\n",
" <td>22.0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Jordan</th>\n",
" <td>25</td>\n",
" <td>41.300000</td>\n",
" <td>42.5</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>KSA</th>\n",
" <td>0</td>\n",
" <td>51.171908</td>\n",
" <td>52.0</td>\n",
" <td>959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Kuwait</th>\n",
" <td>47</td>\n",
" <td>53.000000</td>\n",
" <td>52.0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Lebanon</th>\n",
" <td>60</td>\n",
" <td>60.000000</td>\n",
" <td>60.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Oman</th>\n",
" <td>31</td>\n",
" <td>55.333333</td>\n",
" <td>59.0</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Qatar</th>\n",
" <td>23</td>\n",
" <td>51.866667</td>\n",
" <td>55.0</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>South Korea</th>\n",
" <td>16</td>\n",
" <td>54.190217</td>\n",
" <td>55.0</td>\n",
" <td>186</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tunisia</th>\n",
" <td>34</td>\n",
" <td>34.500000</td>\n",
" <td>34.5</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UAE</th>\n",
" <td>4</td>\n",
" <td>44.181818</td>\n",
" <td>42.0</td>\n",
" <td>77</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UK</th>\n",
" <td>30</td>\n",
" <td>34.500000</td>\n",
" <td>34.5</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Yemen</th>\n",
" <td>44</td>\n",
" <td>44.000000</td>\n",
" <td>44.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
" age HCW\n",
" amin mean median count_nonzero\n",
"country \n",
"France 51 51.000000 51.0 1\n",
"Iran 35 51.500000 51.0 8\n",
"Italy 2 22.000000 22.0 2\n",
"Jordan 25 41.300000 42.5 20\n",
"KSA 0 51.171908 52.0 959\n",
"Kuwait 47 53.000000 52.0 3\n",
"Lebanon 60 60.000000 60.0 1\n",
"Oman 31 55.333333 59.0 9\n",
"Qatar 23 51.866667 55.0 15\n",
"South Korea 16 54.190217 55.0 186\n",
"Tunisia 34 34.500000 34.5 2\n",
"UAE 4 44.181818 42.0 77\n",
"UK 30 34.500000 34.5 2\n",
"Yemen 44 44.000000 44.0 1"
]
}
],
"prompt_number": 33
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transposing is a cool feature...turn your rows into columns with a single letter."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"country_table.T"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>France</th>\n",
" <th>Iran</th>\n",
" <th>Italy</th>\n",
" <th>Jordan</th>\n",
" <th>KSA</th>\n",
" <th>Kuwait</th>\n",
" <th>Lebanon</th>\n",
" <th>Oman</th>\n",
" <th>Qatar</th>\n",
" <th>South Korea</th>\n",
" <th>Tunisia</th>\n",
" <th>UAE</th>\n",
" <th>UK</th>\n",
" <th>Yemen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>gender</th>\n",
" <th>count_nonzero</th>\n",
" <td>1</td>\n",
" <td>8.0</td>\n",
" <td>2</td>\n",
" <td>20.0</td>\n",
" <td>959.000000</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>9.000000</td>\n",
" <td>15.000000</td>\n",
" <td>186.000000</td>\n",
" <td>2.0</td>\n",
" <td>77.000000</td>\n",
" <td>2.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">age</th>\n",
" <th>size</th>\n",
" <td>1</td>\n",
" <td>8.0</td>\n",
" <td>2</td>\n",
" <td>20.0</td>\n",
" <td>959.000000</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>9.000000</td>\n",
" <td>15.000000</td>\n",
" <td>186.000000</td>\n",
" <td>2.0</td>\n",
" <td>77.000000</td>\n",
" <td>2.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>51</td>\n",
" <td>51.5</td>\n",
" <td>22</td>\n",
" <td>41.3</td>\n",
" <td>51.115957</td>\n",
" <td>53</td>\n",
" <td>60</td>\n",
" <td>55.333333</td>\n",
" <td>51.866667</td>\n",
" <td>54.190217</td>\n",
" <td>34.5</td>\n",
" <td>44.181818</td>\n",
" <td>34.5</td>\n",
" <td>44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>median</th>\n",
" <td>51</td>\n",
" <td>51.0</td>\n",
" <td>22</td>\n",
" <td>42.5</td>\n",
" <td>52.000000</td>\n",
" <td>52</td>\n",
" <td>60</td>\n",
" <td>59.000000</td>\n",
" <td>55.000000</td>\n",
" <td>55.000000</td>\n",
" <td>34.5</td>\n",
" <td>42.000000</td>\n",
" <td>34.5</td>\n",
" <td>44</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 196,
"text": [
"country France Iran Italy Jordan KSA Kuwait \\\n",
"gender count_nonzero 1 8.0 2 20.0 959.000000 3 \n",
"age size 1 8.0 2 20.0 959.000000 3 \n",
" mean 51 51.5 22 41.3 51.115957 53 \n",
" median 51 51.0 22 42.5 52.000000 52 \n",
"\n",
"country Lebanon Oman Qatar South Korea Tunisia \\\n",
"gender count_nonzero 1 9.000000 15.000000 186.000000 2.0 \n",
"age size 1 9.000000 15.000000 186.000000 2.0 \n",
" mean 60 55.333333 51.866667 54.190217 34.5 \n",
" median 60 59.000000 55.000000 55.000000 34.5 \n",
"\n",
"country UAE UK Yemen \n",
"gender count_nonzero 77.000000 2.0 1 \n",
"age size 77.000000 2.0 1 \n",
" mean 44.181818 34.5 44 \n",
" median 42.000000 34.5 44 "
]
}
],
"prompt_number": 196
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pivot tables are another great way to make fancy tables. If you can't make groupby work for your issue, you probably need a pivot table."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"gender_grouped = data.pivot_table(index=['country'],\n",
" columns=['gender'],\n",
" values='age',\n",
" fill_value=np.nan)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 40
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"gender_grouped.T.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>country</th>\n",
" <th>France</th>\n",
" <th>Iran</th>\n",
" <th>Italy</th>\n",
" <th>Jordan</th>\n",
" <th>KSA</th>\n",
" <th>Kuwait</th>\n",
" <th>Lebanon</th>\n",
" <th>Oman</th>\n",
" <th>Qatar</th>\n",
" <th>South Korea</th>\n",
" <th>Tunisia</th>\n",
" <th>UAE</th>\n",
" <th>UK</th>\n",
" <th>Yemen</th>\n",
" </tr>\n",
" <tr>\n",
" <th>gender</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>F</th>\n",
" <td>NaN</td>\n",
" <td>51.0</td>\n",
" <td>22</td>\n",
" <td>42.666667</td>\n",
" <td>48.590323</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>31.000</td>\n",
" <td>56.000000</td>\n",
" <td>54.578947</td>\n",
" <td>35</td>\n",
" <td>44.789474</td>\n",
" <td>30</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>M</th>\n",
" <td>51</td>\n",
" <td>52.5</td>\n",
" <td>NaN</td>\n",
" <td>41.058824</td>\n",
" <td>52.465710</td>\n",
" <td>53</td>\n",
" <td>60</td>\n",
" <td>58.375</td>\n",
" <td>51.571429</td>\n",
" <td>53.916667</td>\n",
" <td>34</td>\n",
" <td>43.982759</td>\n",
" <td>39</td>\n",
" <td>44</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 44,
"text": [
"country France Iran Italy Jordan KSA Kuwait Lebanon Oman \\\n",
"gender \n",
"F NaN 51.0 22 42.666667 48.590323 NaN NaN 31.000 \n",
"M 51 52.5 NaN 41.058824 52.465710 53 60 58.375 \n",
"\n",
"country Qatar South Korea Tunisia UAE UK Yemen \n",
"gender \n",
"F 56.000000 54.578947 35 44.789474 30 NaN \n",
"M 51.571429 53.916667 34 43.982759 39 44 "
]
}
],
"prompt_number": 44
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we plot this, it looks...okay."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"gender_grouped.T.plot(kind='bar')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"<matplotlib.axes._subplots.AxesSubplot at 0xc607f98>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEOCAYAAACpVv3VAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVOX+wPEPiCAIsgiIIKC5PFlqmXldsizN0nKrrOxW\nZl6X61JapqXmvqUoipZalmtW19JErSwty191s8ysTD24BSigbLIKCMzvD2Auy7B6hmHG7/v18uXM\nc57znO9xhi/H55zzPXYGgwEhhBDWw97SAQghhKgeSdxCCGFlJHELIYSVkcQthBBWRhK3EEJYGUnc\nQghhZRwq66CUmgYMAOoDbwI/AJuAfOA4MF7TNLmmUAghakmFR9xKqXuBbpqmdQfuBW4ClgPTNU27\nB7ADBpk5RiGEEMVUNlXyAPCnUmoXsAfYDXTSNO1Q4fIvgPvNGJ8QQohSKpsq8QECgf4UHG3voeAo\nu0g64G6e0IQQQphSWeJOAE5qmpYLRCilsoCAYsvdgCuVbSQ3N8/g4FCv5lEKIeqMiIgInp32AS7u\nviXaM1Mus3XxP2nTpo2FIrNJdqYaK0vc3wMTgVCllD/gAnytlOqpadp3QD/g68q2nJycWc1YRXl8\nfNyIj0+zdBjiBpaUlI6Luy+ungEml8v3Uz8+Pm4m2ytM3JqmfaaUukcp9TMF8+HjgL+B9UopR+AE\n8Im+oQohhKhIpZcDapr2qonme/UPRQghRFXIDThCCGFlKj3iFkKIqsjPy+X8+fMkJaWXWRYYGIyj\no6MForJNkriFELrISk9k4XeraVjqhFpGfBpLB86jZcvWForM9kjiFkLopqGPG27+HpYOw+bJHLcQ\nQlgZSdxCCGFlJHELUUpqair79++zdBhClEsStxClnDkTwfffH6q8oxAWIicnhVXLzs5i0aK5XLp0\niWvXrvHii5MJD99BbOxF8vLyefLJp+nduw8TJoxm6tQZBAUFs2vXJyQlJfHQQwOYPXs6TZr4cfHi\nBdq2vZVXXnmNLVs2cPbsGXbv/pQ///yd1NQUUlNTaNmyDTfd1JJHH32c1NRUXnppPO+9t9XS/wTi\nBiSJW1i1Xbt24O/fjLlzF3PhQjRff/0Vnp5ezJo1n8zMTEaMeIY77+yMnV3xWj3/e33hQhQrV67B\nycmJJ54YRFJSIs899y927drBwIGPcPz4H3Tq9A+eeOIpYmIuMmfODB599HH279/Hgw/2q/0dFgJJ\n3DYhJyeH6OhIk8ts/caH6OgounbtDkCzZoEkJCTQuXMXAFxcXGjRogUXL14osY7B8L8HNgUEBOLs\n7AxA48be5ORcK7EcICgoGAB//wBcXFz4++/zHDiwjyVLVphtv4SoiCRuGxAdHcmPL71IUxeXEu2x\nmZl0X7HKpm98CA5uwcmTJ+jRoycXL17g4MH9ODo6cs8995KZmcHZs2do2jQAR0cnEhLiCQoKJiLi\nFD4+BSVJSx6JAxioV69eieRdvM+AAY+wceN6fH2b0KiRlKIXliGJ20Y0dXEhyNV0CUhbNmjQoyxe\nPI8JE0ZjMBhYtmwVO3ZsZ9y4kWRnZzNixGg8PT0ZMuRJQkOX4Ovrh4+PjzEZl03cdgQENOPcuTNs\n3/5hmT49e97HihVLmT17fm3tohBlSOIWVs3R0ZHZsxeUaJsxY06Zft263UW3bneVaV+3boPx9dtv\nbzS+fv/9j01uLy8vl6ZNm9K5c9caRizE9ZPLAYWooj///J3Ro4fzzDPDLR2KuMHJEbcNy83PJyrq\nxjxpaQ7t29/G5s0fWToMISRx27L4rKsc+88feLhfKtF+JeUS46YMsOmTlkLYMkncNs7DvQne5Twb\nUAhLupEvY71ekriF7ir6gawp+UG2PdHRkawJ2YOHe5MS7fI/wspJ4ha6i46OZGLIblzcfXUZLzPl\nMmFTBsoPsg2S/xHWjCRuYRYu7r641uIPZGxsDM899xRK3Wxs69SpM8OHj6y1GISoLZK4hc1o0eIm\nVq9+29Jh2IzyprzKu1JJ1B5J3MJmHT16hLVrV+Po6MjAgY/g6OjIp59+Qm5uLnZ2dixaFMLZs2fY\ntm0Ljo71iYm5SO/eDzBs2Aiio6NYsmQBubm5ODk1YO7cRWRnZxESsojs7GycnJyYOnUGvr5NKg/E\nSkVHRzJ3/4+4+ZX8n1Pc8d+oh5+FohIgiVvYkL//PscLL4wxvh8w4BGuXbvG+vWbAdi6dSMhIStx\ncmpASMgiDh/+CR8fHy5dimPLlo/Iyclh8OC+DBs2grfeWsmwYSP4xz+68v33hzh9+hR794YzZMhQ\nunbtzpEjP7Nu3ZvMmmW5W9/1uiqjoiNrN78A3AOCS7SnXYohK6368Qr9SOIWNqN585JTJb/99qux\nsh+Ah4cnCxbMwdnZmaioSNq16wBAy5Ytsbe3p0GDBjg5OQEFVQfbtWsPQI8e9wAQFhbK1q0b2bZt\nMwaDgfr169fOjpUjOjqSqbtnXfdT1aOjIznyzSL8/UoWzTp+/AK0/bdu8Qr9SOIWZpGZctniYxkM\nBmOBqPT0dDZseIedOz8jPz+fl1+eUKwCYOlCUwVVB0+c+Is77/wH+/fvIzU1lebNmzN06DO0a9eB\nc+fOcOLE8Zrukm70eqq6v587wc28SrTFxKVw6rpHFuYgiVvoLjAwmLApA3UfszKlK/3Z2dkZ21xd\nXWnf/jbGjHkeT09PAgODSUxMoGlTf5MPWRg/fiJLly5i8+b3cHZ2ZubM+XTrdhfLlr1BTk422dnZ\nTJo0Rbf9s2X5uXkmT2jKSc6ak8QtdOfo6Fjr11w3bepfotIfQMeOnejYsZPx/bx5i02uW7xPeHjB\nQ4IDApoRFramRD83NzdCQ1frFfIN42pSBhd2LievVL34PxIT4Y5RForKukniroTclisszRYuyzNV\nLz42M4MMC8Vj7aqUuJVSR4GUwrfngMXAJiAfOA6M1zTNYHpt66bXCSAhaqq8O1ETL5wkQB57eUOq\nNHErpRoAaJp2X7G23cB0TdMOKaXWAoOAXWaL0sL0OgEkRE2ZuhM1M+USkGCZgIRFVeWI+zbARSn1\nZWH/GcAdmqYdKlz+BfAANpy4hRCiLqlK4s4AQjRNe08p1RrYV2p5OiBPTRVGUh1QCPOqSuKOAM4A\naJp2WimVCHQsttwNuFLRAJ6eLjg41KtxkJaUnOxa7jIvL1d8fGr/Ab2lt5mc7Mr5ao5hztgjIiJM\nnheoqYz4NN4dHkpAQBtdxrM2FX0Hy1Odzzc52ZWYam/BvCz1s2UtqpK4nwc6AOOVUv4UJOqvlFI9\nNU37DugHfF3RAMnJmdcdqKUkJaVXuCw+vnbv/fXxcSuzzYpiLI85Y09KStf9vEBl8cbGxjBnzowS\nD/y1Feb+fGsyvrlZ4merLirvl1dVEvd7wEalVNGc9vNAIrBeKeUInAA+0SNIIYQQlas0cWualgs8\na2LRvbpHI0QNFd39+MILY/D09CItLZUFC5bwxhsLyMhIJyEhnkcffZzBg4cwYcJo2rRRnDt3loyM\nDObPX4Kfn1S7E9ZDbsCpBXITT+3q0+dB7r77XiIiTnH//Q/Ss+d9JCTEM2HCGAYPHoKdnR233NKO\nF1+czDvvrOHAgX0888xwS4ctRJVJ4q4F5VVfi4lLgV7T5SYenQUFNQfA09OL7ds/5NChb3BxcSUv\nL8/Yp00bBYCvbxOSkhItEaYQNSaJu5aYqr5myzJ0PLFUlbEMBoOx2l/RtMlHH22jXbv2DB48hKNH\nj/Df/35fbI2yFQGFsBaSuIXuAgODWTpwnu5jVqSoEmDxSn933XU3K1eGcOjQt7RocRMuLi5cu3bN\n5LpCWBNJ3EJ3lqgO6OfXtMylgHfccSdbtvynTN/iD1sYPPgxs8cmhN7sLR2AEEKI6pHELYQQVkYS\ntxBCWBlJ3EIIYWXk5KTQnVQHFMK8JHEL3UVHR/LjSy/StNQzBmsqNjOT7itWyY1KQhSSxC3MwtQz\nBs2pqDrgq6/OIC0tjdtu62iy39GjRwgP38ncuYtqLTYh9CZz3MKmfPvtN5w/f67c5XKzjbAFcsQt\nbEZKSgpffLEXBwcHlLqZuLhYPv30E3Jzc7Gzs2PRohDjbfG//PITu3fvYv78NwAYO3YECxYspXFj\nb0vughBVIolbGFl7FUN3d3e6du2Ol1dj2ra9lSNHfiYkZCVOTg0ICVnE4cM/4ePjA0Dnzl1ZuXIZ\naWlpxMdfxsPDU5K2sBqSuIVRdHQkc/f/iJtfyaeJp8VdZHYfrOLkYPFiUx4enixYMAdnZ2eioiJp\n165Dib4PPNCPAwe+JCbmIv37D679YIWoIUncogQ3vwDcAyou6FQVsZn6Pa4uNjOTFlXsa29vj8Fg\nID09nQ0b3mHnzs/Iz8/n5ZcnGBN6kYcfHsjcua+TnZ3N2LEv6BavEOYmiVvoLjAwmO4rVuk2Xgsq\nrw4IBScelbqZt95aRXBwc9q3v40xY57H09OTwMBgEhMTaNrU33iC0tvbh4YNG9KuXQfs7eU8fV1X\n3ak8a5/6q4gkbqE7S1QHbNrUn3XrNgDQrVsPoKA6oCkdO3Yq8b5//0HmDU7oorpTedHRkUzdPYuG\npR64mxGfxtKB86xi6q88krjFDSk7O4tx40bRqVNnAgKaWTocUUXVncpr6OOGm7+HGSOyDEnc4obk\n5NSA997baukwhKgRmdgTQggrI0fcOjN1QiQqKlL+oYUQupF8orPo6EjWhOzBw72JsS3qwl8MG2rB\noGqZVAcUwrwkcZuBh3sTvD3/d+Y7OeUSEGW5gGqZqV9e1+NKyiXGTRlg1VcBCKEnSdzCLEr/8qpN\nEyaMZurU6QQFNa/yOleuXGHmzFdLPEhYiLpKTk4Km1Nwg41UARS2S464hU1KS0tj6tRJZGZmkpeX\ny6hR47jjjjt59tknCAoKxsGhPpMmvcKcOa+Tn5+Hn19T47oHDx4oU1Xw7NkzbNu2BUfH+sTEXKR3\n7wcYNmyEBfdQ3MgkcQub9OGHW/nHP7oyZMhQEhLiGTt2JB9/HE5WVhbDh4+ides2rFwZQp8+D9C/\n/2B++eUntmzZCMCFC9EmqwpeuhTHli0fkZOTw+DBfa0ucZd30liuerI+8nkJm5CZmYmjoyMODg4Y\nDAaysq4an4JTVJMkOTkJgKCggjvvoqIijVUBO3ToCBQk7vKqCrZs2RJ7e3saNGiAk5NTre6fHso7\naXyjXfVkC6qUuJVSvsCvQG8gH9hU+PdxYLymaYby1xY3oispl2p1rEWL5vDoo09w++13cOXKFdq1\na8/vv/9G69aK+PjLpKen0aiRO4CxoFTz5i34449jtGrVmr/++hOAjIyKqgpa/7y5qZPGN9pVT7ag\n0sStlKoPvA1kUPDNDQWma5p2SCm1FhgE7DJrlMKqBAYGM27KAN3HrMjQoc+wcuUyAO67rzePP/4U\nixfP49tvvyE7O4upU2dQr149iiff4cNHMn/+LL75Zj/Bwc2xs7OjYUPXSqsKFrD+JC6sV1WOuEOA\ntcC0wvd3aJp2qPD1F8ADSOIWxViiOmC7dh14990tJdoWL15Wpt/HH4cbXzdq5E5ISFiZPvPmLTa5\njeJVBcPD99U0VCGuW4WJWyk1HIjXNO0rpdQ0Cg4zih9qpAPulW3E09MFB4d61xOnxSQnu5a7zMvL\nFZ9SJSMr6l/VMSpjapvnqzVC9WOvSZxCH9X9ToE+301L0uP7acvf58qOuJ8HDEqp+4Hbgc2AT7Hl\nbsCVyjaSnKzf01BqW1JSeoXL4uPTqty/qmNUxMfH7bq3Wd52q7uvonbUxudb1+jx/bSF73N5v1wq\nvAFH07Semqbdq2nafcAxYBiwTynVs7BLP+BQuQMIIYTQXXUvBzQAk4H1SilH4ATwie5RCSFEFeXn\nXiMqyvT16baqyom78Ki7yL36hyJshVQHFLUpIzGe5QficXGPLdGeeOEkAf0sFJSZyQ04QnfR0ZEc\n+WYR/n6Vnreukpi4FOg1XaoDinK5uPviWur69MyUS0CCZQIyM0ncwiz8/dwJbuZVa9s7evQI4eE7\nmTt3EVBQb2TjxvUsXRrG6tXLuXr1KlevZtK8+U1MmjTFeOfjiRPHGT9+FGvXvsfNN99Sa/EKcT2k\nOqCwOfv372Pbti2Eha1j587/0LlzV0JD32Tt2g04OzsTHr7D2HfPnl0MHfoMO3d+bMGIhageOeIW\nNqHorsZ9+z5jx47trFy5BldXV7y8GvPtt1/TrFkg7dt3YPz4Sca+mZmZHD16hK1btzNs2FBSUq7g\n7m57TwQXtkcSt7AJBoOBP/44RkJCPGlpaeTm5gLw5JNP4+bWiA8+2MrJk3/RocNtTJ78Gr6+Tfj6\n66/o2fM+HB0d6d27D3v3hvP0089ZeE+EqJxMlQib0bixNytXruHxx4cyb95MDAYDR478TL9+/QkN\nXc2ePV/Rtu2trFq1HCiYJjl+/E8mT36R33//jfDwncUKSglRd8kRtzCLmLgUXcfyr8J5w4CAZtSv\nX5/HHnuCn3/+L5s3v8epUydITEygb9+HcXBwoEWLm4iKiuTs2TMYDPmsWfOucf2XXhrPDz/8Hz16\n3KNb7EKYgyRuobvAwGDoNV238fxvqbw6oJ2dXYnqfdOmzWbEiKcZO/YFvvnmANu3f4iTkyMeHl5M\nnvwa77+/ib59Hy4xxoABj7Bz58eSuEWdJ4lb6M4S1QE7duxUonqfh4cHO3d+BkCfPn3L9J806ZUy\nbb163U+vXvebL0ghdCJz3EIIYWUkcQshhJWRxC2EEFZGErcQQlgZOTl5A8rLyzVrGUypDiiEeUni\nvgGlpidy+cy3OKSXrN53/PgFaPvv6x4/OjqSuft/xM0voPLOVZAWd5HZfZDqgEIUksR9gzJVvS8m\nLoVTOo3v5heAe0DF117rqbzqgMuWrcLXt8l1j3/6dAQ//HCI4cNH8t13B7n11vZ4e3tf97hC1IQk\nbmFz9u/fx0cfbSMsbB2enp66jNm6dRtat24DwCeffESLFi0ASdzmYO6pPFsgiVvYhNLVAcPC1uLq\n6sqECaOZOnUGQUHB7Nr1CUlJSWRkpNO+/W3ce29vXn75Bbp06cqTTz7NkiULePjhgcTHX+bTTz8h\nNzcXOzs7Fi0K4ezZM4SH76Rv34c4fTqCBQvmsGbNuzg4yI+Q3sw9lWcL5KoSYROKqgPu2bOrRHXA\n4rfBQ8Hre+65j59++pHs7GzS09P49dcjAGjaKdq168CFC9GEhKxkzZp3ad68BYcP/2Qcp1u3HrRu\n3YbXX58rSduMiqbyiv/x9Tb9xPMbkSRuYTNMVQcsruh9hw63ExFxiqNHj3Dvvb1ITk7i999/o127\n9gB4eHiyYMEcFi2ay9mzZ8jLy63lPRGiYnLIIMwiLe6ivmO1r/xEp6nqgI6OTiQkxBMUFExExCl8\nfHyxs7NDqVv44IMtvPjiZBITE1mzZhVjxownPT2dDRveYefOz8jPz+fllyeU+QVgb29Pfn6+bvsn\nRHVJ4ha6CwwMZnYfHQdsH1zj6oBTpkwnNHQJvr5++Pj4GPv07HkfixfPpXXrNiQlJfLll59z++13\nYG9vT/v2tzFmzPN4enoSGBhMYmICTZv6G9dt164DCxbMZsWKt3Bzk/++i9oniVvorq5VB+zW7a4y\n/bt27U54+JcAdOnSjb179xuXzZu3uNxtAIwaNZZRo8bqFrsQ1SVz3EIIYWUkcQshhJWRxC2EEFZG\nErcQQlgZOTkpdCfVAYUwL0ncViQnJ4eIiAiSktJLtNe1Gg7R0ZFMDNmNi7uvLuNlplwmbMpAqQ4o\nRKFKE7dSqh6wHmgDGIB/A9nAJiAfOA6M1zTNUN4YQh/R0ZFM3T2Lhj4lrx2OPxXLZOrW0aiLuy+u\nnvqUda2K0tUBy/P553tISUnhqaeeqaXIhNBfVea4+wP5mqb1AF4HFgHLgemapt1DQQGIQeYLURTX\n0McNN3+PEn9cvBpaOiyLK1mT5Pr7CVGXVXrErWlauFJqb+Hb5kAycL+maYcK274AHgB2mSVCYXH5\nudfKnY6pK3PPpW9LB/jtt19Zv34t9vb2BAQ0Y8qU6RgMBn7++b/89NMPZGZmMmLEaLp1u4uDBw+Y\nrAi4bdsWHB3rExNzkd69H2DYsBHExsawePE8423vkyZNoVWr1gwd+ggdOtxOVFQknp5eLFy4FHt7\nOf8v9FelOW5N0/KUUpuAwcDjQPEbmtMBd1PrFfH0dMHBoV5NY7So5GTXcpd5ebniU2raoqL+VR2j\nJrGYU0ZiPMsPxOPiHluiPTPlMlsX/5OAgDYl2s0RZ2X/Th4eLjRoUN/Yx2AwsHz5Yj788EO8vLwI\nCwvj++8P0KiRM35+vixbtozExESeeOIJBgx4kOTky2zc+B4NGjRg1qxZnDx5jCZNmpCYeJk9e/aQ\nnZ3N3XffzeTJE5k//y1GjfoXvXr14tSpU8yYMYMdO3YQGxvDBx9so0mTJjz11FPExf3Nbbfddt37\nXpN/Tz2+m7asOj93dVGVT05qmjZcKdUE+BloUGyRG3ClonWTkzNrFl0dUPpEYOll8fFpVe5f1TFq\nEou5lTdnrce/QVVU9u905UomWVnXjH2Sk5O5fDmeceMmAJCdnU3nzl1o1iyQm29uX9jPkQYNXDhz\n5gL167swadJknJ2diYqKpFWrtjg5uREc3ILExAyg4Fb++Pg0IiJO06JFW+Lj02jcOICYmFji49Nw\nd/fA3t6F+Pg0PD29uXz5SpU/28r2vSbr1MbnYq2q83NnSeX9cqnKyclngWaapi0GrgJ5wBGlVE9N\n074D+gFf6xirsAGZKZctOpa7uzu+vr4sWRKKi0tDDh36Fjc3N+LiYvnrrz8ZNOhR4uMvk52dRf36\nDhVUBCw7Jx4c3IJjx47So8c9nD6t0bhx44Kepbqamr4RQg9VOeL+BNiklPoOqA9MBE4B65VSjsCJ\nwj5CAAXz3mFTBuo+ZkXs7Oz45ZfDjBw5zNj25JNP88orEzEY8mnY0JUZM+YSFxdLamoKEyeO5erV\nq7z66us0bOhaaUXAwq0AMGHCJJYsWcBHH71Pbm4ur702q8Ty4jEJYQ5VOTl5FXjSxKJ7dY9G2ARL\nVQf8/POy//EbMGBwiff9+vWnX7/+ZfpVVhEQIDx8HwB+fk1ZseKtMn2LlgOVXpYoxPWQU95CCGFl\nJHELIYSVkcQthBBWRhK3EEJYGSkyJXQn1QGFMC9J3EJ35RXDqqmM+DSWDpwn1QGFKCSJW5hFUTGs\n2hQTc5G33lpJamoqubm5tGrVhrFjX8DFxaVW4xDC3GSOW9iE7Owspk2bzDPPDGf16rdZu/Y9brnl\nVubMmWHp0ITQnRxxC5vw44/f07FjJ9q2vdXY1q9ff3bt2sHChXNwcKjPpUux5OTkcP/9D/DDD//H\npUtxLF68nKZN/Vm6dCGXL18mMTGBHj3uYdSosSxcOAdHR0diY2NJTExgxozZtGlzswX3UogCcsQt\nbEJsbAz+/mWLYPn5NeXYsaP4+/sTGvomzZu3IDY2lpCQMHr27MUPP/wfly9fol279oSGruaddzYR\nHr4DKLhl3c/Pn9DQ1QwZ8iS7d39a27slhElyxC1sgre3LydP/lWm/eLFC9x++x3GI2VXVzeCg5sD\n4ObWiJycbBo1asTJkyc4evRXXFwakpNzzbh+mzYKAB8fX/7883fz74gQVSCJW5hFho4lM6sy1t13\n92TLlg2cPPmXcbpkz55deHh4VFrs6fPP9+Dq6saUKdO5cCGaPXvkyFrUbZK4he4CA4NZOnCe7mNW\nxNnZmSVLQlm9OpSUlBTy8vJo1ao1c+YsYtWq5eUmbzs7Ozp1+gdz576Opp3Ez68pSrUlISHeuLz4\n30LUBZK4he4sUR0QICCgGW+8EVqmffr02cbX//73BOPrJ554yvh606YPKlyvS5dudOnSTa9Qhbgu\ncnJSCCGsjCRuIYSwMpK4hRDCykjiFkIIKyMnJ4XupDqgEOYliVvoLjo6kh9fepGmOhV3is3MpPuK\nVVIdUIhCkriFWTR1cSHIVZ+yrkKIkiRxC5tx6tRJ3nnnLbKysjAY8unY8U5GjBiNg4Ppr3l4+E4e\nfnhgucuFqKvk5KSwCZcvX2LBglm8/PKrrFnzLmvXbsDR0ZFVq5aXu877728iPz+/FqMUQh9yqCFs\nwpdffs6AAYNp1izQ2DZ8+Egef3wQx44dZePG9eTn53P16lVmz17A778fJTExkTlzZrBgwZJyy7qm\npqaQmprC0qVhuLnJ1I+oGyRx11B+bh5RUWWvnDDVJswvLi6WLl26l2n38vLi/PlzzJw5H29vb7Zu\n3cjBgwcYNmwEmzdvYO7cRcayrv37DyY7O5vHHnuYUaPGGuuYFL81Xoi6QBJ3DV1NyuDCzuXklbpy\n4o/ERLhjlIWiqjtiMzN1HatFJX2aNPEjJuZCibb8/Hzi4mLx8fFh5coQXFxciI+/TIcOt5fo5+bm\nVm5Z16CgiotbCWEJkrivg6krJ2IzM8iwUDx1RWBgMN1XrNJtvBZUXh2wb9+HefnlCfTo0RN3dw9m\nzXoNX98mdO7chSVLFrJ9ezjOzs4sXDjHOK9tZ2dHfn4en3++t9yyrlIVUNRFkriF7ixRHdDXtwkz\nZ84nNHQpV69mkp2djYODAy4uDenW7S7Gjx+Jt7cPQUHNSUxMAOC22zoyZcokXnppaqVlXYWoSyRx\nC5uh1M2Ehq4u0Xb27Bn8/QNwdnYu03/GjDnG15WVdRWiLqkwcSul6gMbgGDACVgAnAQ2AfnAcWC8\npmkG84YpRM20bNnK0iEIobvKruN+GojXNO0eoC/wFrAcmF7YZgcMMm+IQgghiqsscX8MzCrW9xpw\nh6ZphwrbvgDuN1NsQgghTKhwqkTTtAwApZQbBUn8dWBZsS7pgLvZohNWSaoDCmFelZ6cVEoFAjuB\ntzRN+1AptbTYYjfgSmVjeHq64OBQr+ZRWlBysqtZx/fycsXHp2p35Jk7lpowFX9ERARrQvbg4d5E\nl21cSbnE64ufIiCgjS7jWZuafO6mPpe6+P2xlOr83NVFlZ2cbAJ8BYzTNO1gYfNvSqmemqZ9B/QD\nvq5sI8mBz3EbAAARWElEQVTJ+t2MUduSktLNPn58fFqdiKUmTMWflJSOh3sTvD0DzLqd0rZu3cSv\nv/5Mbm4u9vb2jB8/CaVurtZ2UlNTOXz4R/r06cvChXO4//4Hy31I8NGjRwgP38ncuYsAOHjwABs3\nrmfZslX4+urzSwtq9rmX97mIAtX5ubOk8n65VHbEPZ2CqZBZSqmiue6JwCqllCNwAvhEryCFqKnz\n58/x44+HWLt2AwCnT0ewcOEck5f5VeTMmQi+//4Qffr0rdY13Pv37+Ojj7YRFrYOT0/Pam1TiOqq\nbI57IgWJurR7zRKNEDXk6urKpUuX2Ls3nC5dutG6dRvWr98MQETEKVauXIa9vT2Ojk68+uoM8vPz\nmTNnBm+/vRGAMWOeZ+7cRWzZsoGzZ8+we3fB3ZPh4Tv54IMtpKen88orr9G27a3GbRYl9n37PmPH\nju2Eha3F1dW1wm2++upLuLt70K3bXXTp0p2wsGUYDAbc3d2ZNm0WDRo4ExKyqETBq1695Py/KElu\nwBE2wcfHlzfeWM6OHdvZuHE9DRo0YPTocfTs2YslSxYybdosWrVqzffff8fq1SuYMGGSiVHseO65\nf7Fr1w4GDnyE48f/4Oab2zJs2Ai++GIvn3++t0TiNhgM/PHHMRIS4klLSyM3N9e4rLxtJiUlsWHD\nNhwcHBg9ejgzZswhOLg5e/fuYtu2LQwc+EiZgleSuEVpkriFTbh48QING7oybVrBjN6pUyd55ZUX\n6djxThITE2jVquAW/A4dOrJu3Ztl1jcYDCX+LqJUWwA8Pb3Izs4qs17jxt6sXLmG3bs/Zd68mSxf\nvgo7O7tyt9m0qb/xwQ1RUX+zbNliAHJzcwkMDKJRo0blFrwSoogkbmEWV1Iu1epYZ86cZvfuT1my\nJBQHBwcCAwNxc3OjXj17vL19OHv2DC1btuLYsaPGSwuTk5PIz88nIyOD2NgYAOrVq1cmeVckIKAZ\n9evX57HHnuDnn//L5s3vMXz4SJPbBLC3/9+tE4GBwcycOQ9f3yYcO3aUlJQUPv98T7kFr4QoIolb\n6C4wMJhxUwboPmZFeva8j8jI84wcOQxnZ2cMBgPjx0+kYUNXXn11BitWLMVgMODg4MBrr83Ey6sx\nnTt3YeTIYQQENDM+gCEgoBnnzp1h+/YPgf/NY5s6UWlnZ1eifdq02YwY8TS33dbR5DYNBkOJ/q+8\nMo3582eRl5eHnZ0d06bNIigouEzBqytXkq/730/YFkncQneWqA4IMGzYCIYNG1GmvXVrxZtvvlOm\nfcqU6SbHef/9j8u0denSrcxlgR07dqJjx07G9x4eHuzc+Znxvaltrlu3wfhaqZtZvfrtMn1KXwlz\n9uxpk3GKG5c8c1IIIayMJG4hhLAykriFEMLKSOIWQggrIycnhe6kOqAQ5iWJW+guOjqSI98swt9P\nn4q/MXEp0Gu6Ra5UEaIuksQtzMLfz53gZl61tr0331yJpp0kKSmRrKws/P0D8PT0Yt68xVVa//Dh\n/3LpUhwDBz5icvmMGVNYuDBEz5CFqDFJ3MImFNUe+eKLvURFRTJmzPhqrV9e6dYikrRFXSKJW9ic\nolvWi9fT/umnH/nmm/1Mnz6boUMfoUOH24mKisTT04uFC5eyb99nREVFMmLEaGbOfJWMjAyys7MY\nPXocnTt3ZeDAB9m9+0t+++1XNm16l/z8fK5evcrs2QsIDAyy8B6LG40kbmGzSt+SXiQ2NobVq9/G\nx8eXsWP/xcmTJ4z9Ll68QGpqCsuXryY5OZmoqMjCsQrW/fvv88ycOR9vb2+2bt3IwYMHTN6tKYQ5\nSeIWN4TihaPc3T3w8fEFwNe3CTk52cZlLVrcxMCBjzJnzgxyc3MZMmRoiXG8vb1ZuTIEFxcX4uMv\n06HD7bWzA0IUI4lbmEVMXIquY/nfUrW+xRO0o6MjCQnxQMGDDYpU9GCbc+fOkJmZydKlK0lISGDs\n2H/RvXsP4/KlSxexfXs4zs7OLFw4h/z8/OrtjBA6kMQtdBcYGAy9TBdwqgn/WyqvDlik+PRI//6D\nWbx4Hl999UWp9U1nbjs7O5o1C2LDhvUcPHiA/Px8Ro36d4l1HnigH+PHF5RtDQpqTmJiQk13S4ga\nk8QtdGep6oAA/fr1N76++ea2bN78YZk+4eH7jK+LHvRb3IIFS8pd54UXXtIjTCGui9zyLoQQVkYS\ntxBCWBlJ3EIIYWUkcQshhJWRk5NCd1IdUAjzksQtdBcdHcnc/T/i5hegy3hpcReZ3QepDihEIUnc\nwizc/AJwD6jatdd6OHr0COHhO0tc3rd27WqaN29Bv379+frr/bzxxjw+/PBTvL29AXjvvbc5cOBL\nvL19jOt07txFbmEXdZ4kbmETTNUkKd62Z8+nDBkylN27dzJixGjj8qFDn2HQoEdrLU4h9CAnJ4VN\nKH6re2kxMRdJT0/n6aef48svPycvL69K6wlRV8kRt7B5e/eG89BDA3B1daVdu/Z8++039O7dB4PB\nwH/+s42vv/7K2HfYsBF07tzFgtEKUbkqJW6lVBfgDU3T7lNKtQI2AfnAcWC8pmly2CIsqkGDBly7\ndq1E29WrmTg6OrF//z6aNvXnhx/+j9TUFOLittO7dx+ZKhFWq9LErZSaCjwDpBc2hQLTNU07pJRa\nCwwCdpkvRGGN0uIu6jtW+4pPdAYHN+f0aY3ExAQaN/YmOzubY8d+IygomLZtby3xCLOnnnqUs2fP\nADJVIqxTVY64zwCPAlsL39+hadqhwtdfAA8giVsUExgYzOw+Og7YPrjS6oANG7oyYcJLTJkyyXj0\n/fjjT/L994cYMKDkcyQHDBjMjh3/wdvbp8xUSVBQMFOm6FfZUAhzqDRxa5q2UynVvFhT8dP36YA+\nj/IWNsNS1QF79ryPnj3vK9HWv//gMv3++c9hxtdFV5gIYU1qcnKyeOV4N+BKZSt4errg4FCvBpuy\nvORkV7OO7+Xlio+PW52IpSaqE7+omZp87qY+l7r4/bEUa//e1iRx/6aU6qlp2ndAP+DrylZITs6s\nwWbqhqSk9Mo7Xef48fFpdSKWmqhO/KJmavK5m/pc6uL3x1Ks5Xtb3i+X6iTuorM4k4H1SilH4ATw\nyfWFJoQQojqqlLg1Tfsb6F74+jRwr/lCEkIIURG5AUfoTqoDCmFekriF7qKjI5kYshsXd19dxstM\nuUzYlIFSHVCIQpK4hVm4uPvi6qlPWdeqMFUdcN26NwkObs6uXTt4++2NAPz++zEWL57HwoVLadmy\nVa3FJ4SeJHELm2CqOmBpR48eITR0KcuWhdGsWWAtRCWEeUjiFjahslvXf/nlMGFhywkNXY2vb5Na\nikoI85CyrsKmxcXFcvHiBdavX8u1azlkZWVZOiQhrpskbmETyqsO6OTkhJOTE8uXr2bixFeYNWsa\n2dnZFopSCH3IVIkwi8yUy7U6VnnVARctCuG77w7i5uZG9+49OHz4R1asWMprr83ULT4hapskbqG7\nwMBgwqYM1H3MipRXHbBevXolTlyOHz+JUaOG8eWXn/Pggw/pGqMQtUUSt9BdXaoOCLBu3Qbja0dH\nRzZv/qg2wxJCdzLHLYQQVkYStxBCWBlJ3EIIYWUkcQshhJWRk5NCd1IdUAjzksQtdBcdHcnU3bNo\nqNOjoTLi01g6cJ5UBxSikCRuYRYNfdxw8/eo1W2+/vqrKNWWZ58dDkBmZgYjRw5j/vwlUglQ2BSZ\n4xY2Y8qUaYSH7+Dvv88D8NZbYQwa9KgkbWFz5Ihb2Ax3dw9eemkqS5bMZ9SoccTGxvLcc/9i8uQX\nycnJxsnJialTZ5CXl8esWdNo0sSPuLhYevd+gPPnzxIRodGt212MGTOes2fPEBa2DIPBgLu7O9Om\nzULTTrFt2xYcHesTE3OR3r0fYNiwEZbebXEDksQtbMpdd93NoUMHWbx4PmvXvseqVct5/PGhdO3a\nnSNHfmbdujcZPXocsbExhIWtISsri8cfH8iuXftwcnJiyJABjBkzniVLFjBjxhyCg5uzd28427Zt\noXPnLly6FMeWLR+Rk5PD4MF9JXELi5DELWxO374Pk52djbe3N+fOnWHr1o1s27YZg8FA/fr1AfD3\nD8DFpSH16jng5dUYN7eCE6lFZU0iI8+zbNliAHJzcwkMDAKgZcuW2Nvb06BBA5ycnGp/54RAErcw\nk4z4NIuNZTAYjA9WCA5uzlNPPUu7dh04d+4MJ04cByp/Yk5QUHNmzpyHr28Tjh07SkpKSuGSyp+0\nI4S5SeIWugsMDGbpwHm6j1lVdnZ2xsQ8fvwkli17g5ycbLKzs5k0aYqxT7E1yrx+5ZVpzJ8/i7y8\nPOzt7XnttZnEx1+uYD0hao8kbqE7S1UHLNKxYyc6duwEFEyJhIauLtOnqGKgk5MTH38cbmwPD98H\ngFI3s3r12yXWadYs0Dhu8b5C1Da5HFAIIayMJG4hhLAykriFEMLKSOIWQggrI4lbCCGsTI2uKlFK\n2QNrgA5ANjBS07SzegYmhBDCtJoecQ8GHDVN6w68BizXLyQhhBAVqWnivgvYB6Bp2mHgTt0iEkII\nUaGa3oDTCEgt9j5PKWWvaVq+DjFZzNmzp8u0RUVFmrzlOjMpg9jM3DLt8VezuJJyqURbWloCMXEp\nZfrGxKWQ61r1J8XoEUtF8VxOSCMt72KZ9oz4S2SnlP0dn5lymago0/HLQw/0lZlyuUzb1bQkk9+H\njPg0k59LVFSkWb8P5cVjqe9nRf8+1s6uqKZDdSillgM/aZr2ceH7aE3TAvUOTgghRFk1nSr5AXgI\nQCnVFfhDt4iEEEJUqKZTJZ8CfZRSPxS+f16neIQQQlSiRlMlQgghLEduwBFCCCsjiVsIIayMJG4h\nhLAykriFEMLKyBNw6jCl1ExN0+YXvvbXNC3G0jEJAaCU2ggYKPv8NoOmaSMsENINRRJ33dYLmF/4\nehtwnwVjEaK4ToALBd/LHwvb7ChI5sLMZKpECFFtmqZ1AB4BGgCvAt2BM5qmfWnRwG4QcsQthKgR\nTdP+pCBpo5S6B3hDKdVM07Sulo3M9knirts6KaX+W/j6lmKvDYUldYWwKKVUI+BRYCjQEHjfshHd\nGCRx120dLB2AEKYopZ6kIFkHATuAsZqmnbdsVDcOueVdCFFtSql84BTwe6lFBk3T/mmBkG4ocsQt\nhKiJXoV/Fx352ZV6L8xIjriFEMLKyOWAQghhZSRxCyGElZHELYQQVkYStxDFKKXuVEodtHQcQlRE\nErcQQlgZuRxQWDWl1GLgMSABiAV2U3BJ2kQKDkx+BcZrmpatlIoFPgZ6ALnAE5qm/a2U6gOEAtnA\nX8XGbgWsARoDmcALmqYdU0ptKmxrCUzRNO2z2thXIYrIEbewWkqpAcBdwC3AQ0BHCm67Hgl00zSt\nIxAPvFK4ShPggKZpdwCHgAlKKUdgM/Ckpml3Aqn871rkzcBUTdM6AWOAj4ptPl7TtFskaQtLkCNu\nYc3uB/6jaVoucEUptYuCG0FaA4eVUgCOFBx1F9lX+Pdx4B6gPRCradqJwvb3gBVKqYZAZ2Bj4TgA\nDZVSXhQk9sNm2yshKiGJW1izPKBeqbZ6wHZN0yYCKKVcKfY91zQtp/Bl0UMASj8MIK/YOFcLj9op\nHCtQ07SkwkSepeN+CFEtMlUirNl+4DGlVP3CKnX9AQ/gEaWUj1LKDlgLvGhi3aJk/Qfgq5QqStD/\nBNA0LRU4rZR6GqBwHvxbs+2JENUgiVtYLU3TvqBgrvo3YC8QA5wE5gLfUDAdAvBG4d/F6zsYKCiI\nlAs8ScGUyK+AZ7F+TwMjlVK/AwuBJ0qtL4RFSK0SYbWUUl2BNpqmbVFK1afgEVrPa5p2vJJVhbBq\nkriF1VJKeQIfAE0p+N/jJk3TQi0blRDmJ4lbCCGsjMxxCyGElZHELYQQVkYStxBCWBlJ3EIIYWUk\ncQshhJX5f8vjDHmyacZcAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0xc733710>"
]
}
],
"prompt_number": 43
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It might be better to look at the difference in age means. To do that we will make a new column, age_diff. We do column-wise math again, subtracting M from F."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"gender_grouped['age_diff'] = gender_grouped['M'] - gender_grouped['F']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 45
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can plot the difference and it looks much better. Note the use of barh here instead of bar for a horizontal layout, and the order() argument to make things tidier."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"gender_grouped['age_diff'].order().plot(kind='barh')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 46,
"text": [
"<matplotlib.axes._subplots.AxesSubplot at 0xc6d29b0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAa8AAAD9CAYAAAALWvhJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXGWZ/vFvs4TFBEyYKBACgYg3OkQERBEMEECDIows\nilEYWSRERXGURVkUR0R+ImKi6LDKpmFERUFEYFAQMOOAMoADPAkBkbCFJSRCAgmkf3+8p0il00tV\n0lWn3qr7c119ddWpc6qfqivpp99z3nrvru7ubszMzHKyWtkFmJmZ1cvNy8zMsuPmZWZm2XHzMjOz\n7Lh5mZlZdty8zMwsO2uUXUA7euWVV7vnzVtYdhkrZfjwdcm1dnD9ZXP95cm5doCRI4d11bO/R14N\nsMYaq5ddwkrLuXZw/WVz/eXJufaV4eZlZmbZcfMyM7PstEXzknSlpC9V3R8m6QFJ48qsKzezZ89i\n5syZZZdhZjagtmhewBRgiqS3FPfPBM6NiHtLrMnMzBqkLWYbRsSzko4GLpB0EjAGOE3SdcDawCJg\nMun1/ifw92KfK4CtgW2BayPipGK0NhXoAp4FDge2A04AXga2AK6IiNOb9gLNzGw57TLyIiJ+DTwA\nXAQcBpwFTI2ICcXtM4BuYHNSQ/og8HXg34B3AUcUT3U+8OniuN8AxxfHbQrsD+xYbDMzs5K0xcir\nyqXAOhHxRDGCOlHSCaRR1OJin4ci4h+SlgBPRcTzAJIq2TBvAX4oCWBNoHIR6N6IWAoslLRooEJG\njhw2aC+qWebNGwrkWXs1118u11+enGuvV7s1r67iC+B+4NsRMUPS1qTRFaRRVH8eAA6JiDmSdgE2\nqPG45Tz99D/q2b0lPPfcC4wYMTTL2itGjhzm+kvk+suTc+1Qf+Ntt+bVzbImcyxpBLU2sA7wuap9\n6Of2p4DLJK0BLCWdThzVz3FmZtZkXU5SbojuHP8Cmj17FiNGDGX48I3KLmWltcNfn66/PDnXn3Pt\nUP/yUO028rJVMHbsltn/BzCzztA2sw3NzKxzuHmZmVl23LzMzCw7bl5mZpYdNy8zM8uOm5eZmWXH\nzcvMzLLT8s1L0m6Sptew36GSvtiMmtqV87zMLBct37yofSkmLxViZtYhclhhY4UlQyTtCpwGvArM\nBo4q9pso6QPAUODUiLhO0oHAp0krxHcD+wHj6CWfS9IYUqTK6sW+n4uIeyTNAm4DBDwFHFCsMG9m\nZiXIYeS1HEldwHnAfhGxG/AYcCip2cyNiD2AfYBzin23BPaOiPHAfcBE+s7n+jZwdkTsChwDXFhs\n3xw4OSJ2AkYCOzT4ZZqZWT9yGHn19E/ARsCVRebWOsCNwIPAHwAiYq6kBaQ4k6eBSyS9AGwFzCie\np7d8rq2qnuNuSaOL7c9ExGPF7UeBtQYqMsdcHed5tQbXX66c68+59nrl2LyeBeYA+xahkh8C5gFj\nSKOo8ySNIjW1xcCpwGjSKPMGlp2G7O0a2f3ALsA1kt4OPNHHvgOufpzj4rbO8yqf6y9XzvXnXDu0\nZ55XN/A+SXdUbfsO8BtJqwHzgU+QmtcGkm4CXgccGRELJN1OGm3NBYI0anuY3vO5jgXOl3Qs6RrZ\nET0e77m/mZmVwHlejeE8r5K0w1+frr88Odefc+3gPC9bBc7zMrNcZDfb0MzMzM3LzMyy4+ZlZmbZ\ncfMyM7PsuHmZmVl23LzMzCw7nipvr5k9exbz5uX9OS8z6wzZj7x65n1JOlDSvZI2GaTn30bSKcXt\n/ST5N7uZWcnaauQlaRLwRWD3iHh6MJ4zIu4G7i7ufo60Mv0TfR9hZmaN1g7NqxtA0iHA0cAeETFf\n0s3A5IiYKWkK8EZgfeD2iPi5pN8C10fE2ZLOJ+V4jaL37K+jgMuAt5NWqB8fEUua+irNzOw17dC8\nuoDxpMYznNR4oPeFd68CPiHpWuD1wO7A2cB2EXGkpC+Tsr8WSfoPUvbXYwAR8RtJ/wscVUvjyjGa\nwJEorcH1lyvn+nOuvV7t0LwgncbbEzgSuFzS+3s8Xrm2dxswFZgA/Bw4UNJ4lmV89ZX9Vbcc1wd0\nJEr5XH+5cq4/59qh/sab/YSNwoMRsTgiziFleJ0MvARsXDy+HUBEdAN3kpKTbyA1s28BP5e0Pin7\n6yBSE1zEirldS4HVG/pKzMxsQO3QvLpZ/hTh4cBkYBrwg+La1mpV+/wC2KqYiHEDMBa4JSLmA5Xs\nr6tYlv1F1bF/BC6V9PrGvRwzMxuI87waI8s8L2iPUw+uvzyuvzw51w7153m1w8jLzMw6jJuXmZll\nx83LzMyy4+ZlZmbZcfMyM7PsuHmZmVl23LzMzCw77bI8lA0C53mZWS46rnlJGgNMJy0BNTwibu1j\nv91Ii/BOal51ZmZWi04+bXgA8NZ+HvfSI2ZmLarjRl6FDYBDgcWS/gJsxoo5Xl0Akt4LHBkRHynu\n3w4cEBFPllC3mZnRuc3rWeA64MmIuEPSnvSd43WjpGnFYryjgKdraVw55uo4z6s1uP5y5Vx/zrXX\nq1ObF6SRVWUhyIFyvC4HJgFbABfU8uQ5LpDpPK/yuf5y5Vx/zrVD/Y23k5vXUmC1qhyv0aRrgDew\nYo7Xj4AfA+sAJzSxRjMz60WnTtjoBv4MHA1sywA5XhHxOLAAuCkilja9WjMzW07Hjbwi4m/ATsXd\n3xTfb+5j91t63L+wASW1jLFjt8z+1IOZdYaOa171krQOcCtp1PVQ2fWYmZmb14AiYhHwjrLrMDOz\nZTr1mpeZmWXMzcvMzLLj5mVmZtlx8zIzs+y4eZmZWXayn21YRJzcQ/rQccVNEXFaORXly3leZpaL\n7JtX4f8iYkLZRZiZWXO0S/NaThEk+f+Al4HzgJdYMfJkHGmdwpdJC+5eERGnS9qStPjumsBC4KOk\nNQ3PLb4vAiZHxJwmviQzM6vSLte83irp95UvYGNgrYjYJSIuB7YkRZ6MB+4jRZ50A5sC+wM7AscX\nz/Vt4BsRsRMwlbT24ZnAtGJ0dxZwRhNfm5mZ9dAuI6/7qk8bStqVtMBuRV+RJ/cWC+0ulLSo2Pbm\nyuMRcU3xfN8FTpR0AmnF+cUDFZRjro7zvFqD6y9XzvXnXHu92qV59dRFijxhgMiT7l6OvR94J3CT\npEnAiGLbWRExQ9LWwLsGKiDHxW2d51U+11+unOvPuXbo3Dyvnk2om2VxJvMlVSJP5rIs8uThHsdV\nbh8HnCvpZOBF4GDS6vM/lLQ26brX5xr0OszMrAZd3d29DT5sFXXn+BfQ7NmzGDEi76ny7fDXp+sv\nT87151w7wMiRw3qGAPerXUZeNgic52VmuWiX2YZmZtZB3LzMzCw7bl5mZpYdNy8zM8uOm5eZmWXH\nzcvMzLLjqfL2msWLFzNz5kyee+6FsktZafPmDXX9JXL95Wlk7aNHb8aQIUMa8twrq+2al6TNSYvr\njiCtDH83cEJE5PkvsokeffQRjjnzatZd/w1ll2JmLWLh/LlMPW5fxo7dsuxSltNWzUvSOsCvgCMi\n4o5i278C04F9yqwtF+uu/waGDh9VdhlmZv1qq+YF7A3cXGlcABFxqaRPSbqYtBr8ZsBawBWkhrYp\n8C/A30jZX5uQ1j68OiJOKY57CRhTbD80Iu5q0usxM7NetNuEjc2Bh3rZ/jdgV+DhiJhIWiV+TETs\nDfyc1MRGAzMiYi/SqvFTimO7gb8V278HTG7oKzAzswG128jrMVKcSU9vAm4B/lLcf57UwADmAWsD\nzwE7SJoALCCNzioqI605wM61FJJjrk4lz8vMrNqIEUNb7ndauzWvXwEnSdqh6prXJ0lhlD2Xz++5\ngvGhwPMRMUXSm1jFEVaOi9vmOsvKzBrruedeaPjvtE7N8wIgIl6UtA9wtqQNSK/vbmAS8F16z++q\n3L4J+Imk7YFHgDslbdxj3256D7A0M7Mmcp5XY2Sb5+Wp8mZWrVlT5evN83Lzaowsm9fixYt58cVn\nsz59OGJEvh8yBddftpzrb2TtzfiQssMobaUNGTKEUaPenOX1uorcwzRdf7lyrj/n2ldGu02VNzOz\nDuDmZWZm2RmweUlqrdUYzcys49Uy8npQ0jmSdmh4NWZmZjWoZcLGW4ADgDMkvQG4FLgsIp5saGVm\nZmZ9qGuqvKT9gamkuJEbgWMj4sEG1ZYzT5UvSatOda51qnHuM8Zcf3lyrh0aMFVe0pbAwcDHSCtP\nnABcBUwArgNaK+RlJUjaDTgqIiZVbfsm8AAwJSLeXWx7D3ARcEBE3FtGrY3kPK/GaNU8JLOc1XLa\n8AbgEmDPiHikslHSdcD7GlVYkw04/Cwa3DnAB9p5tOk8LzPLQS3N69aIOLXnxojoBj4/6BWVo9/h\nqqQ9SadLJ0bEnOaUZGZmfalltuFbJbXWWvjNsxkwFjiNFJGybrnlmJkZ1DbyWgr8XVIAi4pt3RGx\ne+PKarqFLJ/fBTCUlPG1CNiLlOP1U0k7RsRLAz1hq2Xf1MJ5Xo1TTx5Sjv92qrn+8uRce71qaV7H\nseJptXZbzfcBYFtJG0bEk5LWBnYB9gf2j4jngWsl7QV8H/jkQE+Y46yfVpyl1y5qzUNqgxljrr8k\nOdcOjcnzOjAiPlu9QdIlpGTithARCyR9gdSgFgJDgGnAKyzfqI8F7pB0cERcXkKpZmZGP81L0gWk\n6z3vkLR1j2Ne3+jCmi0iriJ9BKCnnar2eRl4W9OKKsHC+XPLLqHt+D01G3z9jby+QZqwMA04lWWn\nDl8B7mtsWVaG0aM347Jvfizr04et/CFlMxs8fTaviHgYeBh4m6T1gPVZ1sAqkxmsjTjPy8xyUcsK\nGycCXyI1q+rrP5s3qigzM7P+1DJh45PA2Ih4utHFmJmZ1aKWDyk/AsxrdCFmZma1qmXk9SBwm6Tf\nAS8X27oj4t8bV5aZmVnfamlejxVfFXUtW29mZjbYBmxevS3Ka+1p8eLFzJw5s+WmmteahWVmnaOW\n2YZLe9n8eERs0oB6GqZnZpekA4GvAnsDZ5Om/w8lfYbts5X1CyW9E7gV2Dki7iyh9KZpxTwvZ2GZ\nWW9qGXm9NqlD0prAh6hadSJHkiYBXwB2J63deENEnFs8djYwBfhusfuRwLeBzwCHNb/a5nKel5nl\noJZrXq+JiCXAlZJOblA9jdQNIOkQ4GhSuOZ8SU8CB0p6EPgjaf3Cyr5DSYnR/wzcK2mDiHi2lOrN\nzOw1tZw2/ETV3S7SL/KX+9i9lXUB44FRwHBgzWL72aSPAhwHvBO4Dfg0MAf4KPCLiHhZ0n8CRwDf\nanLdZmbWQy0jrwksW1mjG3gGOKhhFTXWE8CepFOBl0t6P7AHcElE/Kg4LXoC6ZThgaQPaC+RdB0p\niHITSWcWKdL9yjFXp1XzvOrJwoI83/tqrr9cOdefc+31quWa16GShgAq9v9rcfowRw9GxGLgHEkT\ngZOBdwAbAZdFxBJJ9wFbSRoHrBYR4ysHS7oB+CBwzUA/KMf19VptlmFFrVlYkP/ahq6/XDnXn3Pt\n0IA8L0nvAH5GWtuwC3ijpP0j4r9XqsLydLP82oyHA3cBxwMfkfR54CVgLum04ZeAS3s8x/mkiRsD\nNi8zM2ucWk4bTgMOiog/AUjasdj2zkYWNtgi4haqAjQj4hlgdHF3ei+HHNPLc1wJXNmQAltEq2VP\ntVo9ZtYaamler6s0LoCI+G9JazewJitJq+Z5OQvLzHqqpXnNk/ShiPglgKT9AE8Xb0PtkOdlZp2h\nluY1GbhG0oWka15LgZ0bWpWZmVk/aolE2QtYCGwK7EYade3WuJLMzMz6V0vzOgp4T0S8GBH3ANsC\nn21sWWZmZn2rpXmtASyuur+YdOrQzMysFLVc8/ol8LtieaQuYH/g6oZWZWZm1o9aVtg4QdKHgV2A\nJcDUysxDay+rmufl3C0za5aaVpVvxw/nShoDTI+Id5ddS6tYlTwv526ZWTPVFYli7c95XmaWg05u\nXpXMrt+T1jMcTlpJ/gJgfWBj4JyI+A9JN5PWQdwaWA/4cET8vYyizcysttmGneAnEfE+YCzpVOJE\nYCIpbRlSo/tTRLwXuBGYVE6ZZmYGnT3yqhbF97nA5yXtDyxg+ffnruL7o8CGAz1hjrk6q5rnVW/u\nVqO0Qg2rwvWXK+f6c669Xp3cvLqKL1j2ubUvADOKU4UTgL2r9h8wgLJajusDruqCvPXkbjVKO2Qa\nuf7y5Fx/zrVDA/K82lg3K2Z8XQN8r1h8+P+AfxRBnL0da2ZmJenY5hURjwDv7rHtZmBcL7tPqNrn\n3MZWVq6Vzc9y7paZNVPHNi9b0armeTl3y8yaxc3LXuM8LzPLhafKm5lZdty8zMwsO25eZmaWHTcv\nMzPLjpuXmZllx7MNO9DixYt59NFHen1s/fW3bnI1Zmb1a2jzkvQlYA9gTdISTMdGxF/qfI7hwF4R\nMV3SxaSFc6/vY9/dgKMiYlJx/0Dgq8D7I2LOSr+QNtNXbtfC+XO57JtDGT58o5IqMzOrTcOal6S3\nAvtExM7F/W2AS4C31/lU2wD7AtOpY1kmSZOALwK7R8TTdf7MtufcLjPLWSNHXvOBTSUdDlwfEXdL\neieApG2BacCrwEvAkcDqVCUbS5oBfBQ4CXibpCOL5z1K0vGkzK1PRcQdVT+zktF1CHA0sEdEzB/g\nZ14DPAP8BvgtMJW0YO+zwOHAi8C5wCbARsDVEXHKIL5PZmZWp4ZN2IiIx0gjpp2BP0q6H/hg8fD5\nwGciYjfgB8B36H1U1Q2cBvwuIs4vtt0ZEXsA3wMO7bF/FzCe1JiGk05XVvT1M98IvDciziz2+XRE\nTACuA44nNa0ZEbEX8C5gSr3vhZmZDa5GnjYcC8yPiCOK+9sD1xXJxRtFxD3FrrcCZ/TyFF09vlf8\nufj+FLBuL8c9AexJamCXS3p/RHT38zMfjohXittbAT+UBKnxzQSeA3YoIlIWAGsN+OJp7VydgXK7\nWrn2Wrj+crn+8uRce70aedrwbcBkSftGxBJgFjCPdNrucUnjIuJeYFdSGORLwBskrQasB2xePM+r\n1DdCfDAiFgPnSJpIOu14Wh8/E5ZleVFsOyQi5kjaBdiANLp7PiKmSHoTMLmWIlp5fcCBFt5t5doH\n0g6ZRq6/PDnXn3Pt0EJ5XhFxlaS3AHdIeoHUgI6LiAXF9avvS+oClgBHRMRTkm4E7gBmk5odxe1x\nko4p7ndXfe95qrHntsOBuyTdShqJLfczSaO66v0/BVwmaY1i++GkhvaTYuT4CHCnpI0i4omVf3fM\nzGxVdHV3O1exAbpb+S+g2bNn9TNV/mNZT5Vvh78+XX95cq4/59oBRo4c1vMSUb/8IeUONHr0Zkw9\nbt9eHxszZgzz57/c5IrMzOrj5tWBhgwZwtixW/b5GLh5mVlr89qGZmaWHTcvMzPLjpuXmZllx83L\nzMyy4+ZlZmbZ8WzDDtNflhc4z8vM8tAxzatn1lex7Qzg/oi4RNJHgIuALSurZ0g6FZgEPF71VDdG\nxOlNK3yQ9ZXlBc7zMrN8dEzzou9V6yvbjyTFoUwGvlb1+FkRcV7jy2seZ3mZWe466ZpXX0uPdEna\nHHg98C3gEEmr13CcmZmVpJNGXn3pJi3S+6OImF+EYB4A/JTUuL4g6aNV+38jIv6rhDrNzKzQSc1r\nIStmcQ0lRbF8HHhY0j7ACFIK809ZhdOGrZqrM1CWF7Ru7bVy/eVy/eXJufZ6dVLzegDYVtKGEfGk\npLWBXUiRJ/8TEQdVdpQUksYVd1fqtGGrru48UJYXtG7ttWiDlbVdf4lyrj/n2qGF8rxaTZEj9gXg\nWkkLgSHANGBf4Pweu19AGn09zoqnDSMipjSjZjMz613HNC9IAZnAVT02X9jLfmdW3f1az8dzt3D+\n3Lq2m5m1mo5qXtZ/lhc4z8vM8uDm1WH6y/KqPO48LzNrdZ30OS8zM2sTbl5mZpYdNy8zM8uOm5eZ\nmWXHzcvMzLLj2YYdYqAcrwrneZlZDrJtXpK+DWwPbAisCzwEzK1e5mmA4ycCm0ZEz9U1Ko//PCIO\nGKx6y9ZfjleF87zMLBfZNq+IOBZA0icARcSJdR5//QCPt03jqnCOl5m1i2ybVw9dAJIuBqZHxPWS\n9gIOiojDJM0CbgMEPEWKPPnX4v6pwJXAeqQR3EkRcaOkJyNiQ0m7Al8hXR8cCnwsImY19dWZmdly\n2m3CRnUyMlW3NwdOjoidgJHADlWPjQU2APYBJrGsoVcefytwcERMAH4BfLhh1ZuZWU3aZeTVmy6W\nxZk8ExGPFbcfBdau7BQR90k6F5gOrElaab7a48A0SS8Ao0gjuAG1Wq5OLTleFa1We71cf7lcf3ly\nrr1e7dC8qvO2XgI2Lm5vV7W9ejS2HElbA8Mi4oOSNgJuB66t2uU8YIuIeLE4LVnTaLXVcnVqyfGq\naLXa69EOmUauvzw5159z7dCZeV7VpwovAC6S9HFgZtX2vppXNzAL+Kqkj5Aa0yk9jrkcuFXS46RA\nS0/FMzMrWfbNKyIuqbr9Z2CbXvbZuOr2pOLmLVW7rHAdq3JMRHxx0Iot2UB5Xc7zMrNcZN+8rDYD\n5XhVOM/LzHLg5tUhBsrxqt7PeV5m1urabaq8mZl1ADcvMzPLjpuXmZllx83LzMyy4+ZlZmbZ8WzD\nNlNrbldfnOdlZjloq+Yl6WbgqIiIOo75J+DKYuHd7NWS29UX53mZWS7aqnmx4qryHcm5XWbW7tqt\neQEMl/RrYBjp9Z0cEb+X9FcggMXAMcBPgNWB186xSToQ+DRpdfluYD9gHHAC6ZO7WwBXRMTpzXs5\nZmbWUztO2DgWuD4idiWtWXhhsf11wL8XaxueTAqtnAD8uOrYLYG9I2I8cB8wkdTENgX2B3YEjm/K\nqzAzsz5lP/KSNBR4KSJeIcWjvA64FSAiHpe0QFLlAlDlWphIK9BT2bfwNHBJkd21FTCj2H5vRCwF\nFkpaVEtdZeXq1JPb1ZfcM4Fcf7lcf3lyrr1e2Tcv4GLg+5L+QEpJngGMB/5X0ijg9cCzxb5Li+/3\nAe8B7iGNppC0HnAqMJo0Ir2BZVlhdV9HKytXp57crr7kngnk+svj+suTc+1Qf+Nth9OGZwFnAn8C\nrgSOA3aXdAtwFTA5Il5l+Qb0dWDvYnbiR4HuiFhACqKcURwXLMvuqj624yeEmJmVLfuRV0TMAHbo\nsXm/Xvbbour2c8DevexzUB8/5paqfTbuY5+WsbK5XM7zMrNcZN+8bHm15nb1xXleZpYDN682U2tu\nV3/HO8/LzFpdO1zzMjOzDuPmZWZm2XHzMjOz7Lh5mZlZdty8zMwsO55t2ASrmrHVTM7zMrMcuHk1\nwapkbDWT87zMLBdt27wkbQ+cDqxLOj36e+BrEbGkj/0nAxcVC/wOOmdsmZkNnra85iVpE+Ay4DMR\nMT4idiZ98vbsfg77Minfy8zMWly7jrwOAc6PiAcrGyLi65IekrQL8FVS4x4KfAzYBdgQmF4EUp4H\nbEJamPfqiDhF0sXACGADUubX8818QWZmtky7Nq/NgN/2sv0p4J+BgyPiCUlfBj4cEadLOpm0wvxo\nYEZEXChpbeBR4BTSavI3RcTUWgqoXt5/MDK2min3TCDXXy7XX56ca69XuzavvwNbVG+QtBqpqT0G\nTCsCJ0cBt/U4dh6wg6QJwAJgrarHghpV5+oMRsZWM+WeCeT6y+P6y5Nz7VB/423X5nUpcIOkq4Fn\ngJ+SRlA3AucDW0TEi8WpwMp1v6Wka16HAs9HxBRJbwImVz2vs7zMzFpAWzaviJgj6WDg+6TrWusA\nS0gjqWuBWyU9DjzAssDJW4vHjgZ+UsxWfAS4U1Ilw2ulm1cOWVk51GhmBtDV3d05gwlJ44CHIuLF\nBv+o7urhe04fUt5uu62zzvNqh1Mnrr88Odefc+0AI0cO66pn/7YcefUlIu4t4+euasZWMznPy8xy\n0Jaf8zIzs/bm5mVmZtnpqGteZmbWHjzyMjOz7Lh5mZlZdty8zMwsO25eZmaWHTcvMzPLjpuXmZll\np6NW2GgGSV3AHGBmsWlGRJxYYkk1KVbd/wHwNtISG5+MiNnlVlUfSX8B5hd3H4qII8qspxaS3gWc\nERETioWgLyYtEv1XUphqS3+WpUf92wLXALOKh38YET8tr7q+SVoTuIiUNLEWcBpwP5m8/33UPwf4\nNct+97Ty+786aZH0N5PWjJ1C+r1zMTW+/25eg28s8OeI2LfsQur0IWBIROxU/EI6q9iWhSJ7jYiY\nUHYttZJ0PHAwUMnM+Q5wYkT8QdIPgX8BfllWfQPppf7tge9ExHfKq6pmHweejohDJA0H7gbuIp/3\nv7f6vwaclcn7/0FgaUS8R9KuwOnF9prff582HHzbA6Mk/U7StZLeXHZBNdqZIsAzIv4EvKPccuq2\nDbCupOsl3VQ04Fb3ILA/UFmQdLuI+ENx+zpgz1Kqql3P+rcH9pZ0i6QLJLVyCuuVwFeK26uRUidy\nev97qz+b9z8ifgUcVdwdQ8pR3L6e99/NaxVIOkLSvdVfwOPA6RGxO+mvicvLrbJm65EiYypeLU4l\n5uJF4MyImEg6BfHjVq8/In4BvFK1qXpV7ReA9ZtbUX16qf9PwLERsSvwEPDVUgqrQUS8GBEvSBpG\nagQns/zvw5Z+/3up/yTgf8jk/QeIiFeLTMWpwI+p899/S//nbnURcWFEjKv+Au4Eri4evx3YuN8n\naR0LgOoo09UiYmlZxayEmaT/AETELOBZlmW15aL6/R4GPF9WISvpqoi4q7j9S2DbMosZiKTRwO+A\nSyNiOpm9/z3qv4LM3n+AiDgUEHABsHbVQwO+/25eg+8rwOcBJG0D/L3ccmp2O/ABAEk7AveUW07d\nDiNdp6MID10PeKLUiup3V3H+H+D9wB/627kF/VbSDsXtPUh/yLUkSW8EbgCOj4iLi83ZvP991J/T\n+3+IpC8XdxcBr5KCf2t+/z1hY/CdAVwu6QOkUyqHlltOza4C3ivp9uL+YWUWsxIuBH4kqfIP/rCM\nRo6VGVVfBM6XNAS4D/hZeSXVpVL/FOAcSUtIfzhMLq+kAZ1IOi31FUmVa0fHANMyef97q//zwNmZ\nvP8/Ay5mH77EAAAASElEQVSWdAuwJum9f4A6/v17VXkzM8uOTxuamVl23LzMzCw7bl5mZpYdNy8z\nM8uOm5eZmWXHzcvMzLLj5mVmZtlx8zIzs+z8f80SF2mJhpYUAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x3c3f048>"
]
}
],
"prompt_number": 46
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# II. Comparing data sets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here I am sampling our data set to pretend like we got new cases for a disease we are already tracking."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"old_list = data.sample(n=500)\n",
"new_list= data.sample(50)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 93
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print len(old_list), len(new_list)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"500 50\n"
]
}
],
"prompt_number": 94
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"new_list.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>FT</th>\n",
" <th>KSA_case</th>\n",
" <th>code</th>\n",
" <th>gender</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" <th>province</th>\n",
" <th>city</th>\n",
" <th>district</th>\n",
" <th>...</th>\n",
" <th>citation</th>\n",
" <th>citation2</th>\n",
" <th>citation3</th>\n",
" <th>citation4</th>\n",
" <th>citation5</th>\n",
" <th>sequence</th>\n",
" <th>accession</th>\n",
" <th>patient</th>\n",
" <th>speculation</th>\n",
" <th>age_group</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>400</th>\n",
" <td>403</td>\n",
" <td>380</td>\n",
" <td>NaN</td>\n",
" <td>38F</td>\n",
" <td>M</td>\n",
" <td>38</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.moh.gov.sa/en/CoronaNew/PressReleas...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>30-39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>506</th>\n",
" <td>509</td>\n",
" <td>492</td>\n",
" <td>NaN</td>\n",
" <td>47M</td>\n",
" <td>M</td>\n",
" <td>47</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.moh.gov.sa/en/CoronaNew/PressReleas...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>40-49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001</th>\n",
" <td>1005</td>\n",
" <td>1113</td>\n",
" <td>NaN</td>\n",
" <td>54M</td>\n",
" <td>M</td>\n",
" <td>54</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Al-Hofuf</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.who.int/csr/don/9-april-2015-mers-s...</td>\n",
" <td>http://www.moh.gov.sa/en/CCC/PressReleases/Pag...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50-59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>981</th>\n",
" <td>985</td>\n",
" <td>1109</td>\n",
" <td>NaN</td>\n",
" <td>60M</td>\n",
" <td>M</td>\n",
" <td>60</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Taimah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.who.int/csr/don/26-march-2015-mers-...</td>\n",
" <td>http://www.moh.gov.sa/en/CCC/PressReleases/Pag...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>60-69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>277</th>\n",
" <td>280</td>\n",
" <td>263</td>\n",
" <td>NaN</td>\n",
" <td>52M</td>\n",
" <td>M</td>\n",
" <td>52</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.promedmail.org/direct.php?id=201404...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50-59</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 45 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 95,
"text": [
" number FT KSA_case code gender age country province city \\\n",
"400 403 380 NaN 38F M 38 KSA NaN Jeddah \n",
"506 509 492 NaN 47M M 47 KSA NaN Jeddah \n",
"1001 1005 1113 NaN 54M M 54 KSA NaN Al-Hofuf \n",
"981 985 1109 NaN 60M M 60 KSA NaN Taimah \n",
"277 280 263 NaN 52M M 52 KSA NaN Jeddah \n",
"\n",
" district ... citation \\\n",
"400 NaN ... http://www.moh.gov.sa/en/CoronaNew/PressReleas... \n",
"506 NaN ... http://www.moh.gov.sa/en/CoronaNew/PressReleas... \n",
"1001 NaN ... http://www.who.int/csr/don/9-april-2015-mers-s... \n",
"981 NaN ... http://www.who.int/csr/don/26-march-2015-mers-... \n",
"277 NaN ... http://www.promedmail.org/direct.php?id=201404... \n",
"\n",
" citation2 citation3 citation4 \\\n",
"400 NaN NaN NaN \n",
"506 NaN NaN NaN \n",
"1001 http://www.moh.gov.sa/en/CCC/PressReleases/Pag... NaN NaN \n",
"981 http://www.moh.gov.sa/en/CCC/PressReleases/Pag... NaN NaN \n",
"277 NaN NaN NaN \n",
"\n",
" citation5 sequence accession patient speculation age_group \n",
"400 NaN NaN NaN NaN NaN 30-39 \n",
"506 NaN NaN NaN NaN NaN 40-49 \n",
"1001 NaN NaN NaN NaN NaN 50-59 \n",
"981 NaN NaN NaN NaN NaN 60-69 \n",
"277 NaN NaN NaN NaN NaN 50-59 \n",
"\n",
"[5 rows x 45 columns]"
]
}
],
"prompt_number": 95
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay here is the magic..I want to find case ids in the new list that also appear in the old list. I just use the isin() method, and feed it a list of the identifiers I want to compare against. It returns True and False."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"new_list.number.isin(old_flu.number)[:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 137,
"text": [
"400 True\n",
"506 False\n",
"1001 True\n",
"981 False\n",
"277 False\n",
"376 True\n",
"215 True\n",
"974 True\n",
"1111 False\n",
"223 True\n",
"Name: number, dtype: bool"
]
}
],
"prompt_number": 137
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like there are some! (This number will change each time we run this code because I am sampling randomly.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"new_list.number.isin(old_list.number).value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 97,
"text": [
"False 26\n",
"True 24\n",
"dtype: int64"
]
}
],
"prompt_number": 97
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I want to create a new data set that has just new cases that appear in the old data set."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matches = new_list[new_list.number.isin(old_list.number)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 138
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matches.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>FT</th>\n",
" <th>KSA_case</th>\n",
" <th>code</th>\n",
" <th>gender</th>\n",
" <th>age</th>\n",
" <th>country</th>\n",
" <th>province</th>\n",
" <th>city</th>\n",
" <th>district</th>\n",
" <th>...</th>\n",
" <th>citation</th>\n",
" <th>citation2</th>\n",
" <th>citation3</th>\n",
" <th>citation4</th>\n",
" <th>citation5</th>\n",
" <th>sequence</th>\n",
" <th>accession</th>\n",
" <th>patient</th>\n",
" <th>speculation</th>\n",
" <th>age_group</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>400</th>\n",
" <td>403</td>\n",
" <td>380</td>\n",
" <td>NaN</td>\n",
" <td>38F</td>\n",
" <td>M</td>\n",
" <td>38</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.moh.gov.sa/en/CoronaNew/PressReleas...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>30-39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>506</th>\n",
" <td>509</td>\n",
" <td>492</td>\n",
" <td>NaN</td>\n",
" <td>47M</td>\n",
" <td>M</td>\n",
" <td>47</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.moh.gov.sa/en/CoronaNew/PressReleas...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>40-49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001</th>\n",
" <td>1005</td>\n",
" <td>1113</td>\n",
" <td>NaN</td>\n",
" <td>54M</td>\n",
" <td>M</td>\n",
" <td>54</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Al-Hofuf</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.who.int/csr/don/9-april-2015-mers-s...</td>\n",
" <td>http://www.moh.gov.sa/en/CCC/PressReleases/Pag...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50-59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>981</th>\n",
" <td>985</td>\n",
" <td>1109</td>\n",
" <td>NaN</td>\n",
" <td>60M</td>\n",
" <td>M</td>\n",
" <td>60</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Taimah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.who.int/csr/don/26-march-2015-mers-...</td>\n",
" <td>http://www.moh.gov.sa/en/CCC/PressReleases/Pag...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>60-69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>277</th>\n",
" <td>280</td>\n",
" <td>263</td>\n",
" <td>NaN</td>\n",
" <td>52M</td>\n",
" <td>M</td>\n",
" <td>52</td>\n",
" <td>KSA</td>\n",
" <td>NaN</td>\n",
" <td>Jeddah</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>http://www.promedmail.org/direct.php?id=201404...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50-59</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 45 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 139,
"text": [
" number FT KSA_case code gender age country province city \\\n",
"400 403 380 NaN 38F M 38 KSA NaN Jeddah \n",
"506 509 492 NaN 47M M 47 KSA NaN Jeddah \n",
"1001 1005 1113 NaN 54M M 54 KSA NaN Al-Hofuf \n",
"981 985 1109 NaN 60M M 60 KSA NaN Taimah \n",
"277 280 263 NaN 52M M 52 KSA NaN Jeddah \n",
"\n",
" district ... citation \\\n",
"400 NaN ... http://www.moh.gov.sa/en/CoronaNew/PressReleas... \n",
"506 NaN ... http://www.moh.gov.sa/en/CoronaNew/PressReleas... \n",
"1001 NaN ... http://www.who.int/csr/don/9-april-2015-mers-s... \n",
"981 NaN ... http://www.who.int/csr/don/26-march-2015-mers-... \n",
"277 NaN ... http://www.promedmail.org/direct.php?id=201404... \n",
"\n",
" citation2 citation3 citation4 \\\n",
"400 NaN NaN NaN \n",
"506 NaN NaN NaN \n",
"1001 http://www.moh.gov.sa/en/CCC/PressReleases/Pag... NaN NaN \n",
"981 http://www.moh.gov.sa/en/CCC/PressReleases/Pag... NaN NaN \n",
"277 NaN NaN NaN \n",
"\n",
" citation5 sequence accession patient speculation age_group \n",
"400 NaN NaN NaN NaN NaN 30-39 \n",
"506 NaN NaN NaN NaN NaN 40-49 \n",
"1001 NaN NaN NaN NaN NaN 50-59 \n",
"981 NaN NaN NaN NaN NaN 60-69 \n",
"277 NaN NaN NaN NaN NaN 50-59 \n",
"\n",
"[5 rows x 45 columns]"
]
}
],
"prompt_number": 139
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(matches)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 140,
"text": [
"24"
]
}
],
"prompt_number": 140
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or maybe I want to create a data set of cases that DON'T appear in the old set."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"novel = new_list[new_list.number.isin(old_list.number) == False]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 141
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(novel)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 142,
"text": [
"26"
]
}
],
"prompt_number": 142
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**This can work on ANY kind of identifier, including strings.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# III. More data cleaning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's pretend that 99 is used as a null value, or that we don't believe anyone older than 90 should be in our data set."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age.max()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 101,
"text": [
"99.0"
]
}
],
"prompt_number": 101
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can select out records that meet our criteria, and set them to null values."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age[data.age > 90] = np.nan"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 143
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age.max()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 103,
"text": [
"90.0"
]
}
],
"prompt_number": 103
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait I changed my mind, I want to use -9999 as my null value. Here I use the fillna() method to replace np.nan values with my new number."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age = data.age.fillna(method='ffill')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 203
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age.min()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 204,
"text": [
"-9999.0"
]
}
],
"prompt_number": 204
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also create more complex selection criteria."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age[(data.age < 10) & (data.country == 'KSA')] = 'missing'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 144
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.age.unique()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 148,
"text": [
"array([25.0, 30.0, 40.0, 60.0, 29.0, 33.0, 28.0, 45.0, 46.0, 53.0, 49.0,\n",
" 70.0, 31.0, 39.0, 16.0, 61.0, 69.0, 51.0, 73.0, 56.0, 58.0, 55.0,\n",
" 59.0, 24.0, 87.0, 77.0, 62.0, nan, 50.0, 52.0, 48.0, 43.0, 81.0,\n",
" 64.0, 66.0, 34.0, 35.0, 63.0, 85.0, 76.0, 21.0, 2.0, 42.0, 14.0,\n",
" 83.0, 75.0, 68.0, 'missing', 41.0, 32.0, 12.0, 15.0, 82.0, 26.0,\n",
" 67.0, 54.0, 38.0, 19.0, 79.0, 47.0, 18.0, 74.0, 22.0, 78.0, 23.0,\n",
" 72.0, 65.0, 37.0, 8.0, 57.0, 27.0, 86.0, 71.0, 90.0, 44.0, 13.0,\n",
" 88.0, 17.0, 89.0, 11.0, 80.0, 4.0, 36.0, 10.0, 84.0, 20.0], dtype=object)"
]
}
],
"prompt_number": 148
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another nifty function is splitting up columns into two different columns"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.code.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 105,
"text": [
"0 25M\n",
"1 30M\n",
"2 40F\n",
"3 60M\n",
"4 29M\n",
"Name: code, dtype: object"
]
}
],
"prompt_number": 105
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Combining apply and lambda is a powerful (but sometimes tough) way to do this. Pandas does most things column-wise. Apply is used to do things row-wise. Here we make sure to use the dropna() method to remove our null values, since our operation will only work on strings. The patient's age is the first two characters of the string, so we select those and put them in a new column. Then we do the same for sex."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data['new_age'] = data.code.dropna().apply(lambda x: x[0:2])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 47
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data['sex'] = data.code.dropna().apply(lambda x: x[-1])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 48
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look, it worked!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data[['code', 'new_age', 'sex']].head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>code</th>\n",
" <th>new_age</th>\n",
" <th>sex</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>25M</td>\n",
" <td>25</td>\n",
" <td>M</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>30M</td>\n",
" <td>30</td>\n",
" <td>M</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>40F</td>\n",
" <td>40</td>\n",
" <td>F</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>60M</td>\n",
" <td>60</td>\n",
" <td>M</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>29M</td>\n",
" <td>29</td>\n",
" <td>M</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 49,
"text": [
" code new_age sex\n",
"0 25M 25 M\n",
"1 30M 30 M\n",
"2 40F 40 F\n",
"3 60M 60 M\n",
"4 29M 29 M"
]
}
],
"prompt_number": 49
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another thing you might want to do is split up names into first name and surname. We don't have names in our data set, so we'll use the country column to exemplify that (keep an eye on South Korea)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.country.unique()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 125,
"text": [
"array(['Jordan', 'KSA', 'Qatar', 'UK', 'UAE', 'France', 'Tunisia', 'Italy',\n",
" nan, 'Oman', 'Kuwait', 'Yemen', 'Lebanon', 'Iran', 'South Korea'], dtype=object)"
]
}
],
"prompt_number": 125
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use the .split() method to split the string, and insert a space to tell the computer we want to split on a space."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.country.dropna().apply(lambda x: x.split(' '))[-10:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 51,
"text": [
"1281 [South, Korea]\n",
"1282 [KSA]\n",
"1283 [South, Korea]\n",
"1284 [South, Korea]\n",
"1285 [South, Korea]\n",
"1286 [KSA]\n",
"1287 [KSA]\n",
"1288 [KSA]\n",
"1289 [KSA]\n",
"1290 [KSA]\n",
"Name: country, dtype: object"
]
}
],
"prompt_number": 51
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The we use our bracket selections, e.g. [-1] to say we want the last word in the string set."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data.country.dropna().apply(lambda x: x.split(' ')[-1])[-10:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 136,
"text": [
"1281 Korea\n",
"1282 KSA\n",
"1283 Korea\n",
"1284 Korea\n",
"1285 Korea\n",
"1286 KSA\n",
"1287 KSA\n",
"1288 KSA\n",
"1289 KSA\n",
"1290 KSA\n",
"Name: country, dtype: object"
]
}
],
"prompt_number": 136
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IV. Homework"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Answer the questions in lesson2_homework. I'll post the answers after class #3."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment