Skip to content

Instantly share code, notes, and snippets.

@LouisdeBruijn
Last active March 11, 2020 15:25
Show Gist options
  • Select an option

  • Save LouisdeBruijn/a555d7ba5f0ecf5f15d18bc7983468cc to your computer and use it in GitHub Desktop.

Select an option

Save LouisdeBruijn/a555d7ba5f0ecf5f15d18bc7983468cc to your computer and use it in GitHub Desktop.
def main():
X, Y = read_corpus(args.input, args.binary)
Xtrain, Xtest, Ytrain, Ytest = shuffle_split(X, Y, 0.8)
prior_prob = prior_probabilities(Y)
info('Prior probabilities per class: {0}'.format(prior_prob))
classifier = feature_union(count=True, tfidf=True, textstats=True)
classifier.fit(Xtrain, Ytrain) # fit the classifier on the training set
Yguess = classifier.predict(Xtest) # predict the labels on the test set
posterior_prob = classifier.predict_proba(Xtest) # calculate posterior probabilities
df = tabular_results(Xtest, Ytest, Yguess, prior_prob, posterior_prob)
with pd.option_context('display.max_rows', 10, 'display.max_columns', None):
debug(df)
baseline = baseline_classifier(Xtest, Ytest)
class_report("Baseline classifier", Ytest, baseline, show_matrix=False)
class_report("Naive Bayes classifier", Ytest, Yguess, show_matrix=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment