Skip to content

Instantly share code, notes, and snippets.

@davidmezzetti
Created October 20, 2021 15:01
Show Gist options
  • Save davidmezzetti/f4da00231d3ec3ebc876be26fcc001ca to your computer and use it in GitHub Desktop.
Save davidmezzetti/f4da00231d3ec3ebc876be26fcc001ca to your computer and use it in GitHub Desktop.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
# Train the model
pipeline = Pipeline([
('tfidf', TfidfVectorizer()),
('lr', LogisticRegression(max_iter=250))
])
pipeline.fit(ds["train"]["text"], ds["train"]["label"])
# Determine accuracy on validation set
results = pipeline.predict(ds["validation"]["text"])
labels = ds["validation"]["label"]
results = [results[x] == label for x, label in enumerate(labels)]
print("Accuracy =", sum(results) / len(ds["validation"]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment