Skip to content

Instantly share code, notes, and snippets.

@ettorerizza
Last active January 16, 2020 09:40
Show Gist options
  • Save ettorerizza/20eba704fc5b0aa831e885494ca0fbb3 to your computer and use it in GitHub Desktop.
Save ettorerizza/20eba704fc5b0aa831e885494ca0fbb3 to your computer and use it in GitHub Desktop.
# Source : https://pythonprogramminglanguage.com/logistic-regression-spam-filter/
# dataset : https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
df = pd.read_csv(r'C:/Users/student/Desktop/spam detect logistic regression python/SMSSpamCollection', delimiter='\t',header=None)
X_train_raw, X_test_raw, y_train, y_test = train_test_split(df[1],df[0])
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform( X_train_raw )
classifier = LogisticRegression()
classifier.fit( X_train, y_train )
text = ['URGENT! Win a prize!', 'Hello, how are you?']
X_test = vectorizer.transform( text )
predictions = classifier.predict( X_test )
print(predictions)
"""
> ['spam', 'ham']
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment