Created
June 24, 2018 01:19
-
-
Save EricG-Personal/7f7ab6addd6d9bf6f1f522c332417497 to your computer and use it in GitHub Desktop.
A better way to transform the dataframe?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I have a dataframe that looks like: | |
``` | |
Jan Feb Mar \ | |
actual budgeted difference actual budgeted difference actual | |
Income 889 6716 1687 2437 7137 3640 10288 | |
Expenses 2478 5548 12623 2190 12882 9232 9947 | |
Difference 10661 657 7365 -147 10289 7044 3639 | |
``` | |
My focus is on the Difference row and the 'actual' and 'budgeted' columns across all of the months. | |
and I need to transform it to look like: | |
``` | |
amount months category | |
0 657.0 2018-01-31 budgeted | |
1 8438.0 2018-01-31 actual | |
2 10946.0 2018-02-28 budgeted | |
3 38318.0 2018-02-28 actual... | |
... | |
``` | |
My code to do this is: | |
``` | |
import pandas as pd | |
import string | |
import altair as alt | |
from random import randint | |
# | |
# This is the general form of my 'real' dataframe. It is not subject to change. | |
# | |
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ] | |
monthyAmounts = [ "actual", "budgeted", "difference" ] | |
summary = [] | |
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] ) | |
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] ) | |
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] ) | |
index = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' ) | |
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] ) | |
summaryDF = pd.DataFrame( summary, index = index, columns = columns ) | |
print( summaryDF ) | |
# | |
# From this point, I am trying to transform the summaryDF into something | |
# I can use in a different context... | |
# | |
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' ) | |
idx = pd.IndexSlice | |
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum() | |
actual.index = budgetMonths | |
# | |
# The reason for this is that June is not over with yet, so, there can be no 'actual' amount for it or later months. | |
# | |
actual.iloc[ 5 ] = None | |
actual.iloc[ 6 ] = None | |
actual.iloc[ 7 ] = None | |
actual.iloc[ 8 ] = None | |
actual.iloc[ 9 ] = None | |
actual.iloc[ 10 ] = None | |
actual.iloc[ 11 ] = None | |
actual = actual.cumsum() | |
budgeted.index = budgetMonths | |
budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' }) | |
actualDF = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' }) | |
df3 = (pd.concat([budgetedDF, actualDF]) | |
.sort_index() | |
.reset_index() | |
).drop( 'index', 1 ) | |
print( df3 ) | |
``` | |
The transformation starts after the 'print( summaryDF )' line. | |
As I have only been playing with Pandas for a couple of days, my question is, is there a better way to do the transformation? | |
If it helps and as background, this transformation produces a dataframe that make it easy to use altair to graph with the line: | |
``` | |
budgetChart = alt.Chart( df3 ).configure_view( width = 900 ).mark_line().encode( | |
y = alt.Y( 'amount:Q', axis = alt.Axis( format = '$,f', title = "Cumulative Amount") ), | |
x = alt.X( 'month(months):T', axis = alt.Axis( title = "Month") ), | |
color = 'category' | |
) | |
``` | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just use
The first statement will give you a complete "pivot" of your data. The second statement then just gives you the numbers in the
Differences
row. You can see pictures of the various transformations at: http://pandas.pydata.org/Pandas_Cheat_Sheet.pdf