Skip to content

Instantly share code, notes, and snippets.

@darothen
Created July 7, 2015 13:55
Show Gist options
  • Save darothen/ecef11a02d518796a089 to your computer and use it in GitHub Desktop.
Save darothen/ecef11a02d518796a089 to your computer and use it in GitHub Desktop.
Simple reproduction of `aggr` examples using Pandas
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Quick reproduction of examples from [`aggr` GitHub docs](https://github.com/Horb/aggr). "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before anything, save copies of the two example datasets -"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing data.txt\n"
]
}
],
"source": [
"%%writefile data.txt\n",
"Eggs,12\n",
"Chips,13\n",
"Beans,14\n",
"Eggs,21\n",
"Chips,32\n",
"Beans,43"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing data2.txt\n"
]
}
],
"source": [
"%%writefile data2.txt\n",
"2015-05-13,Eggs,1200\n",
"2015-05-13,Chips,1300\n",
"2015-05-13,Chips,1300\n",
"2015-05-13,Bean,1300\n",
"2015-05-15,Eggs,1300\n",
"2015-05-15,Eggs,1300\n",
"2015-05-15,Eggs,1300\n",
"2015-05-15,Chips,1300\n",
"2015-05-15,Beans,1300"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Example 1) \n",
"\n",
"Aggregate sum over common key in `data.txt`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1\n",
"0 \n",
"Beans 57\n",
"Chips 45\n",
"Eggs 33\n"
]
}
],
"source": [
"df = pd.read_csv('data.txt', header=None)\n",
"print df.groupby([0, ]).sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2/3)\n",
"\n",
"Aggregate sum over first column, then by length of entry in second. It looks like there's an error in this example on the `aggr` github page; the output suggests there are no 5 character strings on 2015-05-13 and no 4 character strings on 2015-05-15.\n",
"\n",
"I bet the 2nd line below could be rolled into the groupby function."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" count\n",
"date item_len \n",
"2015-05-13 4 2500\n",
" 5 2600\n",
"2015-05-15 4 3900\n",
" 5 2600\n"
]
}
],
"source": [
"df = pd.read_csv('data2.txt', header=None, names=['date', 'item', 'count'])\n",
"df['item_len'] = df['item'].apply(lambda x: len(x))\n",
"\n",
"print df.groupby(['date', 'item_len']).sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Composite keys..."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2\n",
"0 1 \n",
"2015-05-13 Bean 1300\n",
" Chips 2600\n",
" Eggs 1200\n",
"2015-05-15 Beans 1300\n",
" Chips 1300\n",
" Eggs 3900\n"
]
}
],
"source": [
"df = pd.read_csv('data2.txt', header=None)\n",
"print df.groupby([0, 1]).sum()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment