Skip to content

Instantly share code, notes, and snippets.

@calippo
Last active November 11, 2019 13:21
Show Gist options
  • Select an option

  • Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.

Select an option

Save calippo/20a147e657ee5e8d8666 to your computer and use it in GitHub Desktop.
[scikit-learn/sklearn, pandas] Plot percent of variance explained for KMeans (Elbow Method)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from sklearn.cluster import KMeans
import numpy as np
from scipy.spatial.distance import cdist, pdist
def elbow(df, n):
kMeansVar = [KMeans(n_clusters=k).fit(df.values) for k in range(1, n)]
centroids = [X.cluster_centers_ for X in kMeansVar]
k_euclid = [cdist(df.values, cent) for cent in centroids]
dist = [np.min(ke, axis=1) for ke in k_euclid]
wcss = [sum(d**2) for d in dist]
tss = sum(pdist(df.values)**2)/df.values.shape[0]
bss = tss - wcss
plt.plot(bss)
plt.show()
@cgrinaldi

cgrinaldi commented May 16, 2016

Copy link
Copy Markdown

I think you are missing from scipy.spatial.distance import cdist. And thanks for posting!

@cheniel

cheniel commented May 17, 2016

Copy link
Copy Markdown

Missing pdist as well:
from scipy.spatial.distance import cdist, pdist

@cheniel

cheniel commented May 18, 2016

Copy link
Copy Markdown

@calippo

calippo commented May 3, 2017

Copy link
Copy Markdown
Author

thanks! including

@divyamounika

Copy link
Copy Markdown

score function in from sklearn.cluster import KMeans gives the same graph pattern

@ericbf

ericbf commented Apr 24, 2018

Copy link
Copy Markdown

Did you purposely spell this eblow instead of elbow?

@calippo

calippo commented Jul 19, 2018

Copy link
Copy Markdown
Author

@ericbf nope, didn't notice :). Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment