Skip to content

Instantly share code, notes, and snippets.

@calippo
Last active November 11, 2019 13:21
Show Gist options
  • Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.
Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.
[scikit-learn/sklearn, pandas] Plot percent of variance explained for KMeans (Elbow Method)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from sklearn.cluster import KMeans
import numpy as np
from scipy.spatial.distance import cdist, pdist
def elbow(df, n):
kMeansVar = [KMeans(n_clusters=k).fit(df.values) for k in range(1, n)]
centroids = [X.cluster_centers_ for X in kMeansVar]
k_euclid = [cdist(df.values, cent) for cent in centroids]
dist = [np.min(ke, axis=1) for ke in k_euclid]
wcss = [sum(d**2) for d in dist]
tss = sum(pdist(df.values)**2)/df.values.shape[0]
bss = tss - wcss
plt.plot(bss)
plt.show()
@ericbf
Copy link

ericbf commented Apr 24, 2018

Did you purposely spell this eblow instead of elbow?

@calippo
Copy link
Author

calippo commented Jul 19, 2018

@ericbf nope, didn't notice :). Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment