Skip to content

Instantly share code, notes, and snippets.

@tomatau
Created July 14, 2015 20:57
Show Gist options
  • Save tomatau/045f8f59c0245b7f2335 to your computer and use it in GitHub Desktop.
Save tomatau/045f8f59c0245b7f2335 to your computer and use it in GitHub Desktop.
tf-ds-u2-lesson2.2
#!/usr/bin/python
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
# read the data
loansData = pd.read_csv(
'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv'
)
# clean it
loansData.dropna(inplace=True)
col = 'Amount.Requested'
# generate a box plot and show it
loansData.boxplot(column=col, return_type='axes')
plt.show()
# generate a histogram and show it
loansData.hist(column=col)
plt.show()
# create a new figure
plt.figure()
# qq plot against normal
graph = stats.probplot(
loansData[col], dist="norm", plot=plt
)
plt.show()
# Amount.Requested and Amount.Funded.By.Investors are both fairly normal and roughly the same median
# Amount Funded By Investors has more of a positive tail
# and more spread (from 0 to 35000) but the mode is higher
# and the inner quartiles are closer to the median,
# especially the upper quartile, implying the mean is lower?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment