Skip to content

Instantly share code, notes, and snippets.

@kuchaale
Forked from calippo/eblow.py
Created March 16, 2018 09:53
Show Gist options
  • Save kuchaale/db5aff34ce934604458fb0ac1030f269 to your computer and use it in GitHub Desktop.
Save kuchaale/db5aff34ce934604458fb0ac1030f269 to your computer and use it in GitHub Desktop.
[scikit-learn/sklearn, pandas] Plot percent of variance explained for KMeans (Elbow Method)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from sklearn.cluster import KMeans
import numpy as np
from scipy.spatial.distance import cdist, pdist
def eblow(df, n):
kMeansVar = [KMeans(n_clusters=k).fit(df.values) for k in range(1, n)]
centroids = [X.cluster_centers_ for X in kMeansVar]
k_euclid = [cdist(df.values, cent) for cent in centroids]
dist = [np.min(ke, axis=1) for ke in k_euclid]
wcss = [sum(d**2) for d in dist]
tss = sum(pdist(df.values)**2)/df.values.shape[0]
bss = tss - wcss
plt.plot(bss)
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment