Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save andreaschandra/ce8acf6f5d91b77c008a40a3b3c6e916 to your computer and use it in GitHub Desktop.
Save andreaschandra/ce8acf6f5d91b77c008a40a3b3c6e916 to your computer and use it in GitHub Desktop.
load data
# Load some categories from the training set
categories = [
'alt.atheism',
'talk.religion.misc',
'comp.graphics',
'sci.space',
]
# Uncomment the following to do the analysis on all the categories
# categories = None
print("Loading 20 newsgroups dataset for categories:")
print(categories)
dataset = fetch_20newsgroups(subset='all', categories=categories,
shuffle=True, random_state=42)
print("%d documents" % len(dataset.data))
print("%d categories" % len(dataset.target_names))
print()
labels = dataset.target
true_k = np.unique(labels).shape[0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment