Skip to content

Instantly share code, notes, and snippets.

@Steboss89
Last active September 18, 2022 16:42
Show Gist options
  • Save Steboss89/c6ddb5e250d06c3a31cd916a41376356 to your computer and use it in GitHub Desktop.
Save Steboss89/c6ddb5e250d06c3a31cd916a41376356 to your computer and use it in GitHub Desktop.
Run a naive Bayes classificatory
import pandas as pd
# vectorize words
from sklearn.feature_extraction.text import CountVectorizer
# naive bayes
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import auc, roc_curve
# train test split
from sklearn.model_selection import train_test_split
# MAIN
# preprocess steps
# ...
#create train, val and test
X_train, X_valid, y_train, y_valid = train_test_split(
tweets_df['clean_text3'], target_df['sentiment'], train_size=0.75
)
# count vectorizer
vectorizer = CountVectorizer()
# transform
X_train = vectorizer.fit(X_train)
X_valid = vectorizer.transform(X_valid)
# model
classifier = MultinomialNB()
classifier.fit(X_train, y_train)
# run predictions
y_pred = classifier.predict(X_valid)
# compute metrics
fpr, tpr, thresholds = roc_curve(y_valid, y_pred)
roc_auc = auc(fpr, tpr)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment