Skip to content

Instantly share code, notes, and snippets.

@matmoody
Created April 22, 2016 18:15
Show Gist options
  • Save matmoody/c39a2185e92bfd2ccf9d54a58fc305e7 to your computer and use it in GitHub Desktop.
Save matmoody/c39a2185e92bfd2ccf9d54a58fc305e7 to your computer and use it in GitHub Desktop.
from scipy import stats
import collections
# Apply collections.Counter() on # open credit lines in Loans data to get counts of observations for each # credit lines
# Load reduced version of Lending Club dataset
loansData = pd.read_csv("https://github.com/Thinkful-Ed/curric-data-001-data-sets/raw/master/loans/loansData.csv")
# Drop null rows
loansData.dropna(inplace=True)
freq = collections.Counter(loansData['Open.CREDIT.Lines'])
plt.figure()
plt.bar(freq.keys(), freq.values(), width=1)
plt.show()
print len(freq)
# Chi-squared test (scipy.stats.chisquare) to verify answer
chi, p = stats.chisquare(freq.values())
print chi
print p
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment