Skip to content

Instantly share code, notes, and snippets.

@rtindru
Created February 10, 2022 08:22
Show Gist options
  • Save rtindru/3fea74353dcbdac94a5e4fef8112be48 to your computer and use it in GitHub Desktop.
Save rtindru/3fea74353dcbdac94a5e4fef8112be48 to your computer and use it in GitHub Desktop.
Step 1: Build a pandas model for sentiment classification
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.pipeline import Pipeline
%%bash
if [ ! -f ./trainingandtestdata.zip ]; then
wget -q http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
unzip -n trainingandtestdata.zip
fi
columns = ['polarity', 'tweetid', 'date', 'query_name', 'user', 'text']
dftrain = pd.read_csv('training.1600000.processed.noemoticon.csv',
header = None,
encoding ='ISO-8859-1')
dftest = pd.read_csv('testdata.manual.2009.06.14.csv',
header = None,
encoding ='ISO-8859-1')
dftrain.columns = columns
dftest.columns = columns
sentiment_lr = Pipeline([
('count_vect', CountVectorizer(min_df = 100,
stop_words = 'english')),
('lr', LogisticRegression())])
sentiment_lr.fit(dftrain.text, dftrain.polarity)
Xtest, ytest = dftest.text[dftest.polarity!=2], dftest.polarity[dftest.polarity!=2]
print(classification_report(ytest,sentiment_lr.predict(Xtest)))
sentiment_lr.predict(Xtest).shape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment