Skip to content

Instantly share code, notes, and snippets.

@drcjar
Last active December 24, 2015 00:19
Show Gist options
  • Save drcjar/6716180 to your computer and use it in GitHub Desktop.
Save drcjar/6716180 to your computer and use it in GitHub Desktop.
Paracetamol Choropleth

UPDATE: pretty sure I've not done my dataframe preparation right... should probably aggregate to individual practices in the prescribing data then map to CCGs then aggregate practices to CCGs.. and do more sanity checks. Think have fixed for iron analysis... will revisit shortly.

So on this american life this week they suggested that there might be more liver failure occuring due to paracetamol than we realise... I thought it'd be fun to see if there is an association between paracetamol usage and liver deaths per 10,000 population by CCG.

Let's practice Choropleth makage again by making a Paracetamol Choropleth using open data, ogr2ogr, folium and pandas (and indirectly leaflet.js, d3.js, GeoJSON, open street maps and moar) and talk about using gist and blocks. I will skip steps from last time.

  1. get some prescribing data from here www.hscic.gov.uk/gpprescribingdata e.g

    wget 'http://datagov.ic.nhs.uk/presentation/2013_04_April/T201304PDPI+BNFT.csv'
    
  2. we can re-use the ccg boundaries GeoJSON we made last time which is here.

  3. so now all we need to do is prepare our data (the prescribing data) such that it has a column that matches our CCG Codes in the GeoJSON and a column with our thing of interest. I've decided our thing of interest is the number of items prescribed containing paracetamol per 10,000 population by CCG so we want a dataframe with that and our CCG Code as headings.

  4. a bit of footwork is required to prepare the dataset because our prescribing data has individual GP practices so we need to map them to CCGs. We also need to add some CCG population size data into the mix in order to the per 10,000 bit.

    wget 'http://www.connectingforhealth.nhs.uk/systemsandservices/data/ods/ccginterim/interimpcmem_v5.zip'
    wget 'https://indicators.ic.nhs.uk/download/Clinical%20Commissioning%20Group%20Indicators/Data/CCG_registered_patients_2012.csv'
    

The above .csv file from connecting for health tells us which practice belongs to which CCG without too much trouble. The above .csv file of 'CCG_registered_patients_2012' is helpfully broken down into age ranges which we don't care about for our choropleth making hackery so we roll them up. All of this is done using python pandas (see ipython notebook part one).

  1. make the choropleth using folium magic python code (see ipython notebook part two)

    map = folium.Map(location=[54.2, -2.45], zoom_start=5)
    map.geo_json(geo_path=ccg_geo, data_out='data10.json', data=para_analysis,
       columns=['CCG13CD', 'Per_person_para_by_ccg'],
       key_on='feature.properties.CCG13CD',
       threshold_scale=[5, 6, 7, 8, 9, 10]
       fill_color='PuBu', fill_opacity=0.7, line_opacity=0.3,
       legend_name='Number of paracetamol items prescribed in April per 10,000 population by CCG')
    map.create_map(path='map_10.html')
  2. kinda need some data on liver deaths now...

Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "ParacetamolAnalytics"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "import pandas as pd",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": "from pandas import DataFrame, Series",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "data = pd.read_csv('T201304PDPI+BNFT.csv', usecols=[2,3,4,5])",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": "data.columns",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": "Index([u'PRACTICE', u'BNF CODE', u'BNF NAME ', u'ITEMS '], dtype=object)"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "data.columns = ['PRACTICE', 'BNF CODE', 'BNF NAME', 'ITEMS'] #rename to cut the white space",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": "data['ITEMS'] = data['ITEMS'].astype(int) #make the occasional strings in this numerical field into ints",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": "para = data[data['BNF NAME'].str.contains('Paracet')] #make a new DataFrame that contains rows with partial string 'Paracet'",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 82
},
{
"cell_type": "code",
"collapsed": false,
"input": "para.irow(1)",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": "PRACTICE Y03375\nBNF CODE 0407010H0AAAAAA\nBNF NAME Paracet_Cap 500mg \nITEMS 1\nName: 372, dtype: object"
}
],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": "para",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 59937 entries, 21 to 9602688\nData columns (total 4 columns):\nPRACTICE 59937 non-null values\nBNF CODE 59937 non-null values\nBNF NAME 59937 non-null values\nITEMS 59937 non-null values\ndtypes: int64(1), object(3)\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": "<class 'pandas.core.frame.DataFrame'>\nInt64Index: 59937 entries, 21 to 9602688\nData columns (total 4 columns):\nPRACTICE 59937 non-null values\nBNF CODE 59937 non-null values\nBNF NAME 59937 non-null values\nITEMS 59937 non-null values\ndtypes: int64(1), object(3)"
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": "gp2ccg = pd.read_csv('practice_to_ccg_codes.csv')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": "gp2ccg = gp2ccg[['PRACTICE', 'CCGCODE', 'CCG13CD']]",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": "para = pd.merge(para, gp2ccg, on='PRACTICE')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 83
},
{
"cell_type": "code",
"collapsed": false,
"input": "para #some entries have gone missing... this suggests not all practices matched up...",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 58368 entries, 0 to 58367\nData columns (total 6 columns):\nPRACTICE 58368 non-null values\nBNF CODE 58368 non-null values\nBNF NAME 58368 non-null values\nITEMS 58368 non-null values\nCCGCODE 58368 non-null values\nCCG13CD 58368 non-null values\ndtypes: int64(1), object(5)\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 85,
"text": "<class 'pandas.core.frame.DataFrame'>\nInt64Index: 58368 entries, 0 to 58367\nData columns (total 6 columns):\nPRACTICE 58368 non-null values\nBNF CODE 58368 non-null values\nBNF NAME 58368 non-null values\nITEMS 58368 non-null values\nCCGCODE 58368 non-null values\nCCG13CD 58368 non-null values\ndtypes: int64(1), object(5)"
}
],
"prompt_number": 85
},
{
"cell_type": "code",
"collapsed": false,
"input": "ccg_pop = pd.read_csv('ccgcode_pop.csv')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 93
},
{
"cell_type": "code",
"collapsed": false,
"input": "ccg_pop",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 211 entries, 0 to 210\nData columns (total 2 columns):\nCCG13CD 211 non-null values\nPopulation 211 non-null values\ndtypes: int64(1), object(1)\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 94,
"text": "<class 'pandas.core.frame.DataFrame'>\nInt64Index: 211 entries, 0 to 210\nData columns (total 2 columns):\nCCG13CD 211 non-null values\nPopulation 211 non-null values\ndtypes: int64(1), object(1)"
}
],
"prompt_number": 94
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis = para.groupby('CCG13CD')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 86
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis = para_analysis.ITEMS.sum()",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 87
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 88,
"text": "CCG13CD\nE38000001 8779\nE38000002 12209\nE38000003 9358\nE38000004 14779\nE38000005 26553\nE38000006 25283\nE38000007 20336\nE38000008 5005\nE38000009 19560\nE38000010 31079\nE38000011 18831\nE38000012 61741\nE38000013 20387\nE38000014 7855\nE38000015 7126\n...\nE38000197 1597\nE38000198 1347\nE38000199 2293\nE38000200 2047\nE38000201 3367\nE38000202 2722\nE38000203 2310\nE38000204 2747\nE38000205 6837\nE38000206 5790\nE38000207 364\nE38000208 1928\nE38000209 97\nE38000210 935\nE38000211 442\nName: ITEMS, Length: 210, dtype: int64"
}
],
"prompt_number": 88
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis.to_csv('para_analysis', header=True)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 89
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis = pd.read_csv('para_analysis.csv')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 90
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 210 entries, 0 to 209\nData columns (total 2 columns):\nCCG13CD 210 non-null values\nITEMS 210 non-null values\ndtypes: int64(1), object(1)\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 91,
"text": "<class 'pandas.core.frame.DataFrame'>\nInt64Index: 210 entries, 0 to 209\nData columns (total 2 columns):\nCCG13CD 210 non-null values\nITEMS 210 non-null values\ndtypes: int64(1), object(1)"
}
],
"prompt_number": 91
},
{
"cell_type": "code",
"collapsed": false,
"input": "ccg_pop",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nInt64Index: 211 entries, 0 to 210\nData columns (total 2 columns):\nCCG13CD 211 non-null values\nPopulation 211 non-null values\ndtypes: int64(1), object(1)\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 95,
"text": "<class 'pandas.core.frame.DataFrame'>\nInt64Index: 211 entries, 0 to 210\nData columns (total 2 columns):\nCCG13CD 211 non-null values\nPopulation 211 non-null values\ndtypes: int64(1), object(1)"
}
],
"prompt_number": 95
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis['Per_person_para_by_ccg'] = (para_analysis['ITEMS'] / ccg_pop['Population']) * 10000",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 118
},
{
"cell_type": "code",
"collapsed": false,
"input": "para_analysis['Per_person_para_by_ccg'].describe()",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 120,
"text": "count 210.000000\nmean 12.074122\nstd 15.234032\nmin 0.026533\n25% 3.589400\n50% 7.699624\n75% 13.856595\nmax 120.779997\ndtype: float64"
}
],
"prompt_number": 120
},
{
"cell_type": "code",
"collapsed": false,
"input": "ccg_geo = 'ccgs.json'",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 99
},
{
"cell_type": "code",
"collapsed": false,
"input": "import folium",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 100
},
{
"cell_type": "code",
"collapsed": false,
"input": "map = folium.Map(location=[54.2, -2.45], zoom_start=5)\nmap.geo_json(geo_path=ccg_geo, data_out='data10.json', data=para_analysis,\n columns=['CCG13CD', 'Per_person_para_by_ccg'],\n key_on='feature.properties.CCG13CD',\n threshold_scale=[5, 6, 7, 8, 9, 10]\n fill_color='PuBu', fill_opacity=0.7, line_opacity=0.3,\n legend_name='Number of paracetamol items prescribed in April per 10,000 population by CCG')\nmap.create_map(path='map_10.html')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 117
},
{
"cell_type": "code",
"collapsed": false,
"input": " ",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment