Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save EricG-Personal/7f7ab6addd6d9bf6f1f522c332417497 to your computer and use it in GitHub Desktop.
Save EricG-Personal/7f7ab6addd6d9bf6f1f522c332417497 to your computer and use it in GitHub Desktop.
A better way to transform the dataframe?
I have a dataframe that looks like:
```
Jan Feb Mar \
actual budgeted difference actual budgeted difference actual
Income 889 6716 1687 2437 7137 3640 10288
Expenses 2478 5548 12623 2190 12882 9232 9947
Difference 10661 657 7365 -147 10289 7044 3639
```
My focus is on the Difference row and the 'actual' and 'budgeted' columns across all of the months.
and I need to transform it to look like:
```
amount months category
0 657.0 2018-01-31 budgeted
1 8438.0 2018-01-31 actual
2 10946.0 2018-02-28 budgeted
3 38318.0 2018-02-28 actual...
...
```
My code to do this is:
```
import pandas as pd
import string
import altair as alt
from random import randint
#
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]
summary = []
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
index = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )
summaryDF = pd.DataFrame( summary, index = index, columns = columns )
print( summaryDF )
#
# From this point, I am trying to transform the summaryDF into something
# I can use in a different context...
#
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )
idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual.index = budgetMonths
#
# The reason for this is that June is not over with yet, so, there can be no 'actual' amount for it or later months.
#
actual.iloc[ 5 ] = None
actual.iloc[ 6 ] = None
actual.iloc[ 7 ] = None
actual.iloc[ 8 ] = None
actual.iloc[ 9 ] = None
actual.iloc[ 10 ] = None
actual.iloc[ 11 ] = None
actual = actual.cumsum()
budgeted.index = budgetMonths
budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })
df3 = (pd.concat([budgetedDF, actualDF])
.sort_index()
.reset_index()
).drop( 'index', 1 )
print( df3 )
```
The transformation starts after the 'print( summaryDF )' line.
As I have only been playing with Pandas for a couple of days, my question is, is there a better way to do the transformation?
If it helps and as background, this transformation produces a dataframe that make it easy to use altair to graph with the line:
```
budgetChart = alt.Chart( df3 ).configure_view( width = 900 ).mark_line().encode(
y = alt.Y( 'amount:Q', axis = alt.Axis( format = '$,f', title = "Cumulative Amount") ),
x = alt.X( 'month(months):T', axis = alt.Axis( title = "Month") ),
color = 'category'
)
```
@Dr-Irv
Copy link

Dr-Irv commented Jun 25, 2018

Just use

df2 = pd.melt(summaryDF.reset_index(), id_vars=['type'])
df2[df2['type'] == 'Difference']

The first statement will give you a complete "pivot" of your data. The second statement then just gives you the numbers in the Differences row. You can see pictures of the various transformations at: http://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment