Created
March 9, 2016 15:21
-
-
Save samuelleach/db6d339fe16a67422f3c to your computer and use it in GitHub Desktop.
Preparation script to create data for a Sankey diagram in Tableau. Input data is a row based and 'country' and 'sector' are two dimensions of the data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Preparation script to create data for a Sankey diagram in Tableau. | |
# Input data is a row based and 'country' and 'sector' are two dimensions of the data. | |
# See https://public.tableau.com/profile/samuelleach#!/vizhome/SankeyTemplate/Dashboard1 | |
import pandas as pd | |
infile = 'all_loans.csv' | |
outfile = 'all_loans_sankey.csv' | |
sankey_columns = ['country', 'sector'] | |
print 'Reading ' + infile | |
df = pd.read_csv(infile, low_memory=False) | |
print 'Performing groupby operations' | |
bygroup_treatment = df.groupby(sankey_columns) | |
df = bygroup_treatment.sum() | |
print 'Adding RowType column' | |
df['RowType'] = pd.Series('Dummy', index=df.index) | |
print 'Copying dataframe' | |
df2 = df.copy() | |
df2['RowType'] = 'Real' | |
print 'Concatenating dataframes' | |
result = pd.concat([df, df2]) | |
print 'Writing data frame to ' + outfile | |
result.to_csv(outfile) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment