Skip to content

Instantly share code, notes, and snippets.

@rammanokar
Created December 3, 2022 17:19
Show Gist options
  • Save rammanokar/24985fe621487596acf3b21e1bfef547 to your computer and use it in GitHub Desktop.
Save rammanokar/24985fe621487596acf3b21e1bfef547 to your computer and use it in GitHub Desktop.
ticket classification using BERT
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer, BertForSequenceClassification
# load the labeled helpdesk ticket dataset
df = pd.read_csv("helpdesk_tickets.csv")
# preprocess the text data
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
df["text"] = df["text"].apply(lambda x: tokenizer.tokenize(x))
# split the data into train, validation, and test sets
train, val, test = train_test_split(df, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
# fine-tune the BERT model
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
model.train(train["text"], train["class"], val["text"], val["class"], epochs=3)
# evaluate the model on the test set
accuracy = model.evaluate(test["text"], test["class"])
print("Accuracy:", accuracy)
# make predictions on new helpdesk tickets
predictions = model.predict(["Can't access my account", "How do I change my password?"])
print("Predicted classes:", predictions)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment