Skip to content

Instantly share code, notes, and snippets.

@phobson
Created August 16, 2013 00:32
Show Gist options
  • Save phobson/6246236 to your computer and use it in GitHub Desktop.
Save phobson/6246236 to your computer and use it in GitHub Desktop.
Reformatting NSQD
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup the environment and load data"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas\n",
"from IPython.display import HTML\n",
"\n",
"# read data in as csv with three rows of column headers\n",
"df_cols = pandas.read_csv('nsqd/NSQD_xtab.csv', na_values=[-999], header=[0,1,2], index_col=range(38))\n",
"\n",
"# seperate out each row of column headers\n",
"df_cols.columns = pandas.MultiIndex.from_tuples(df_cols.columns)\n",
"\n",
"# name each column header\n",
"df_cols.columns.names = ['parameter', 'units', 'value']"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convert data from x-tab to DB format and export"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# stack the data into a row-based format on the \n",
"# 'parameter' & 'units' levels, dropping all rows\n",
"# where the res *and* qual are NA\n",
"df_rows = df_cols.stack(level=0, dropna=True) \\\n",
" .stack(level=0, dropna=False) \\\n",
" .dropna(how='all')\n",
"\n",
"# dump to csv\n",
"df_rows.to_csv('nsqd/nsqd_20130815.csv', na_rep='--')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Show total result/qualifier counts for each parameter in Raleigh, NC"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"raleigh = df_rows.select(lambda x: 'Raleigh' in x[7])\n",
"raleigh_params = raleigh.groupby(level='parameter')\n",
"raleigh_params.size()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"parameter\n",
"BOD5 61\n",
"COD 61\n",
"Days since last rain 63\n",
"Dissolved Phosphorus 60\n",
"Nitrate + Nitrite 50\n",
"Precipitation Depth 63\n",
"Total Cadmium 63\n",
"Total Copper 63\n",
"Total Dissolved Solids 62\n",
"Total Kjeldahl Nitrogen 62\n",
"Total Lead 63\n",
"Total Nitrogen 61\n",
"Total Phosphorus 61\n",
"Total Suspended Solids 62\n",
"Total Zinc 63\n",
"pH 13\n",
"dtype: int64"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Show count of results where there is a qualifier, but no results in Raleigh"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"raleigh[pandas.isnull(raleigh.res)].groupby(level='parameter').size()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"parameter\n",
"BOD5 2\n",
"COD 1\n",
"Dissolved Phosphorus 6\n",
"Total Cadmium 9\n",
"Total Copper 1\n",
"Total Lead 1\n",
"Total Nitrogen 12\n",
"Total Zinc 4\n",
"dtype: int64"
]
}
],
"prompt_number": 7
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment