Last active
October 24, 2015 01:51
-
-
Save onyxfish/36f459dab02545cbdce3 to your computer and use it in GitHub Desktop.
Using agate in a Jupyter notebook
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Using agate in a Jupyter notebook\n", | |
"\n", | |
"First we import agate. Then we create an agate Table by loading data from a CSV file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<agate.table.Table object at 0x106a929b0>\n" | |
] | |
} | |
], | |
"source": [ | |
"import agate\n", | |
"\n", | |
"table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')\n", | |
"\n", | |
"print(table)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 1: What was the total cost to Kansas City area counties?\n", | |
"\n", | |
"To answer this question, we first must filter the table to only those rows which refer to a Kansas City area `county`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1575\n", | |
"15\n" | |
] | |
} | |
], | |
"source": [ | |
"kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))\n", | |
"\n", | |
"print(len(table.rows))\n", | |
"print(len(kansas_city.rows))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can then print the Sum of the costs of all those rows. (The cost column is named `total_cost`.)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"$3716\n" | |
] | |
} | |
], | |
"source": [ | |
"print('$%d' % kansas_city.columns['total_cost'].aggregate(agate.Sum()))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Question 2: Which counties spent the most?\n", | |
"\n", | |
"This question is more complicated. First we group the data by `county`, which gives us a TableSet named `counties`. A TableSet is a group of tables with the same columns." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')\n" | |
] | |
} | |
], | |
"source": [ | |
"# Group by county\n", | |
"counties = table.group_by('county')\n", | |
"\n", | |
"print(counties.keys())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We then use the `aggregate` function to sum the `total_cost` column for each table in the group. The resulting values are collapsed into a new table, `totals`, which has a row for each county and a column named `total_cost_sum` containing the new total." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"('county', 'total_cost_sum')\n" | |
] | |
} | |
], | |
"source": [ | |
"# Aggregate totals for all counties\n", | |
"totals = counties.aggregate([\n", | |
" ('total_cost', agate.Sum(), 'total_cost_sum')\n", | |
"])\n", | |
"\n", | |
"print(totals.column_names)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"county total_cost_sum\n", | |
"SEDGWICK 977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", | |
"COFFEY 691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", | |
"MONTGOMERY 447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", | |
"JOHNSON 420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", | |
"SALINE 245,450.24 ▓░░░░░░░░░░░░░░░░░░ \n", | |
"FINNEY 171,862.20 ▓░░░░░░░░░░░░░ \n", | |
"BROWN 145,254.96 ▓░░░░░░░░░░░ \n", | |
"KIOWA 97,974.00 ▓░░░░░░░ \n", | |
"WILSON 74,747.10 ▓░░░░░ \n", | |
"FORD 70,780.00 ▓░░░░░ \n", | |
"GREENWOOD 69,722.00 ▓░░░░░ \n", | |
"DOUGLAS 68,069.42 ▓░░░░░ \n", | |
"MIAMI 64,691.09 ▓░░░░░ \n", | |
"LYON 51,236.00 ▓░░░░ \n", | |
"HAMILTON 47,989.00 ▓░░░░ \n", | |
"WYANDOTTE 45,259.50 ▓░░░ \n", | |
"CHEYENNE 35,970.92 ▓░░░ \n", | |
"PRATT 35,745.60 ▓░░░ \n", | |
"FRANKLIN 34,309.00 ▓░░░ \n", | |
"HARVEY 27,387.42 ▓░░ \n", | |
" +-----------------+-----------------+------------------+-----------------+\n", | |
" 0 250,000 500,000 750,000 1,000,000\n" | |
] | |
} | |
], | |
"source": [ | |
"totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
NBViewer link: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/onyxfish/36f459dab02545cbdce3/raw/534698388e5c404996a7b570a7228283344adbb1/example.py.ipynb