Skip to content

Instantly share code, notes, and snippets.

@wesslen
Last active July 11, 2023 22:34
Show Gist options
  • Select an option

  • Save wesslen/22f9eaaaccf44515668c3cf9705399f8 to your computer and use it in GitHub Desktop.

Select an option

Save wesslen/22f9eaaaccf44515668c3cf9705399f8 to your computer and use it in GitHub Desktop.
Python script that processes txt transcript files and outputs .jsonl for Prodigy annotations
AGENT hello this is steve can i ask what's your name
CUSTOMER my name is harry
AGENT thanks harry. how can i help you
AGENT hello this is john. how may i assist you
CUSTOMER i have a problem with my order
AGENT i'm sorry to hear that. please provide me your order details
CUSTOMER my order number is 12345
AGENT thank you. let me check that for you
CUSTOMER alright, thank you
AGENT you're welcome. please wait a moment
AGENT i have checked your order. it will be delivered tomorrow
CUSTOMER that's great. thank you for your help
AGENT you're welcome. is there anything else i can assist you with?
CUSTOMER no, that's all. have a great day
AGENT thank you, you too. goodbye
AGENT Welcome to our support line. How can I assist you today?
CUSTOMER Hi, I'm having trouble accessing my account.
AGENT I'm sorry to hear that. Can you please provide me with your username?
CUSTOMER My username is johndoe123.
AGENT Thank you, John. Let me check your account.
AGENT It seems that your account is temporarily locked for security reasons. Have you tried resetting your password?
CUSTOMER No, I haven't. How can I reset my password?
AGENT I will guide you through the password reset process. Please bear with me.
AGENT First, visit our website and click on the "Forgot Password" link.
CUSTOMER Okay, I'm on the website. I found the "Forgot Password" link.
AGENT Great. Click on that link, and it will take you to the password reset page.
CUSTOMER Done. I'm on the password reset page now.
AGENT On the password reset page, enter your email address associated with your account and click "Submit."
CUSTOMER I entered my email and clicked "Submit."
AGENT Perfect. Now, check your email inbox for a password reset link.
CUSTOMER I received the email. It says there's a link to reset my password.
AGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.
CUSTOMER I clicked the link, and it opened a new page to enter a new password.
AGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click "Save" or "Reset Password."
CUSTOMER I entered my new password and clicked "Save."
AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.
CUSTOMER Thank you so much for your help! I appreciate it.
AGENT You're welcome, John. Is there anything else I can assist you with?
CUSTOMER No, that's all. Have a great day!
AGENT Thank you, John. Have a great day too. Goodbye!
CUSTOMER Goodbye!
AGENT Goodbye!
AGENT Welcome to our support line. How can I assist you today?
CUSTOMER Hi, I'm having trouble accessing my account.
AGENT I'm sorry to hear that. Can you please provide me with your username?
CUSTOMER I'm sorry, but I can't remember my username either.
AGENT Not a problem. Let me guide you through the process of resetting both your username and password.
AGENT To reset your username, please visit our website and click on the "Forgot Username" link.
CUSTOMER Okay, I'm on the website. I found the "Forgot Username" link.
AGENT Great. Click on that link, and it will take you to the username recovery page.
CUSTOMER Done. I'm on the username recovery page now.
AGENT On the username recovery page, enter your email address associated with your account and click "Submit."
CUSTOMER I entered my email and clicked "Submit."
AGENT Perfect. We've sent an email to your registered email address with instructions on how to recover your username.
CUSTOMER Alright, I received the email. It says there's a link to recover my username.
AGENT Click on that link in the email. It will redirect you to a page where you can recover your username.
CUSTOMER I clicked the link, and it opened a new page to recover my username.
AGENT Excellent. Follow the instructions on that page to recover your username.
CUSTOMER I successfully recovered my username. It's johnsmith123.
AGENT That's great news, John! Now that you have your username, let's proceed to reset your password.
AGENT To reset your password, please visit our website again and click on the "Forgot Password" link.
CUSTOMER Alright, I'm on the website again. I found the "Forgot Password" link.
AGENT Great. Click on that link, and it will take you to the password reset page.
CUSTOMER Done. I'm on the password reset page now.
AGENT On the password reset page, enter your username and click "Submit."
CUSTOMER I entered my username and clicked "Submit."
AGENT Perfect. Now, check your email inbox for a password reset link.
CUSTOMER I received the email. It says there's a link to reset my password.
AGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.
CUSTOMER I clicked the link, and it opened a new page to enter a new password.
AGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click "Save" or "Reset Password."
CUSTOMER I entered my new password and clicked "Save."
AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.
CUSTOMER Thank you so much for your help! I appreciate it.
AGENT You're welcome, John. Is there anything else I can assist you with?
CUSTOMER Actually, I have another question. What credit card is on file for my billing?
AGENT According to our records, we have your billing information saved with an American Express credit card.
CUSTOMER Oh, I realized that's not the correct card. Can you update it to my Visa card instead?
AGENT Certainly, John. I will update your billing information to your Visa card. Please provide me with the card details.
CUSTOMER The Visa card number is 1234 5678 9012 3456, and the expiration date is 05/25.
AGENT Thank you for providing the updated information, John. I have successfully updated your billing details to your Visa card.
CUSTOMER Thank you for your assistance. That's all I needed help with.
AGENT You're welcome, John. If you have any further questions, feel free to reach out. Have a great day!
CUSTOMER Thank you. You too! Goodbye!
AGENT Goodbye, John!
{"text": "AGENT hello this is steve can i ask what's your name\nCUSTOMER my name is harry\nAGENT thanks harry. how can i help you", "meta": {"call": "chat1.txt"}}
{"text": "AGENT Welcome to our support line. How can I assist you today?\nCUSTOMER Hi, I'm having trouble accessing my account.\nAGENT I'm sorry to hear that. Can you please provide me with your username?\nCUSTOMER My username is johndoe123.\nAGENT Thank you, John. Let me check your account.", "meta": {"call": "chat3.txt - part 1"}}
{"text": "AGENT It seems that your account is temporarily locked for security reasons. Have you tried resetting your password?\nCUSTOMER No, I haven't. How can I reset my password?\nAGENT I will guide you through the password reset process. Please bear with me.\nAGENT First, visit our website and click on the \"Forgot Password\" link.\nCUSTOMER Okay, I'm on the website. I found the \"Forgot Password\" link.", "meta": {"call": "chat3.txt - part 2"}}
{"text": "AGENT Great. Click on that link, and it will take you to the password reset page.\nCUSTOMER Done. I'm on the password reset page now.\nAGENT On the password reset page, enter your email address associated with your account and click \"Submit.\"\nCUSTOMER I entered my email and clicked \"Submit.\"\nAGENT Perfect. Now, check your email inbox for a password reset link.", "meta": {"call": "chat3.txt - part 3"}}
{"text": "CUSTOMER I received the email. It says there's a link to reset my password.\nAGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.\nCUSTOMER I clicked the link, and it opened a new page to enter a new password.\nAGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click \"Save\" or \"Reset Password.\"\nCUSTOMER I entered my new password and clicked \"Save.\"", "meta": {"call": "chat3.txt - part 4"}}
{"text": "AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.\nCUSTOMER Thank you so much for your help! I appreciate it.\nAGENT You're welcome, John. Is there anything else I can assist you with?\nCUSTOMER No, that's all. Have a great day!\nAGENT Thank you, John. Have a great day too. Goodbye!", "meta": {"call": "chat3.txt - part 5"}}
{"text": "CUSTOMER Goodbye!\nAGENT Goodbye!", "meta": {"call": "chat3.txt - part 6"}}
{"text": "AGENT hello this is john. how may i assist you\nCUSTOMER i have a problem with my order\nAGENT i'm sorry to hear that. please provide me your order details\nCUSTOMER my order number is 12345\nAGENT thank you. let me check that for you", "meta": {"call": "chat2.txt - part 1"}}
{"text": "CUSTOMER alright, thank you\nAGENT you're welcome. please wait a moment\nAGENT i have checked your order. it will be delivered tomorrow\nCUSTOMER that's great. thank you for your help\nAGENT you're welcome. is there anything else i can assist you with?", "meta": {"call": "chat2.txt - part 2"}}
{"text": "CUSTOMER no, that's all. have a great day\nAGENT thank you, you too. goodbye", "meta": {"call": "chat2.txt - part 3"}}
{"text": "AGENT Welcome to our support line. How can I assist you today?\nCUSTOMER Hi, I'm having trouble accessing my account.\nAGENT I'm sorry to hear that. Can you please provide me with your username?\nCUSTOMER I'm sorry, but I can't remember my username either.\nAGENT Not a problem. Let me guide you through the process of resetting both your username and password.", "meta": {"call": "chat4.txt - part 1"}}
{"text": "AGENT To reset your username, please visit our website and click on the \"Forgot Username\" link.\nCUSTOMER Okay, I'm on the website. I found the \"Forgot Username\" link.\nAGENT Great. Click on that link, and it will take you to the username recovery page.\nCUSTOMER Done. I'm on the username recovery page now.\nAGENT On the username recovery page, enter your email address associated with your account and click \"Submit.\"", "meta": {"call": "chat4.txt - part 2"}}
{"text": "CUSTOMER I entered my email and clicked \"Submit.\"\nAGENT Perfect. We've sent an email to your registered email address with instructions on how to recover your username.\nCUSTOMER Alright, I received the email. It says there's a link to recover my username.\nAGENT Click on that link in the email. It will redirect you to a page where you can recover your username.\nCUSTOMER I clicked the link, and it opened a new page to recover my username.", "meta": {"call": "chat4.txt - part 3"}}
{"text": "AGENT Excellent. Follow the instructions on that page to recover your username.\nCUSTOMER I successfully recovered my username. It's johnsmith123.\nAGENT That's great news, John! Now that you have your username, let's proceed to reset your password.\nAGENT To reset your password, please visit our website again and click on the \"Forgot Password\" link.\nCUSTOMER Alright, I'm on the website again. I found the \"Forgot Password\" link.", "meta": {"call": "chat4.txt - part 4"}}
{"text": "AGENT Great. Click on that link, and it will take you to the password reset page.\nCUSTOMER Done. I'm on the password reset page now.\nAGENT On the password reset page, enter your username and click \"Submit.\"\nCUSTOMER I entered my username and clicked \"Submit.\"\nAGENT Perfect. Now, check your email inbox for a password reset link.", "meta": {"call": "chat4.txt - part 5"}}
{"text": "CUSTOMER I received the email. It says there's a link to reset my password.\nAGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.\nCUSTOMER I clicked the link, and it opened a new page to enter a new password.\nAGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click \"Save\" or \"Reset Password.\"\nCUSTOMER I entered my new password and clicked \"Save.\"", "meta": {"call": "chat4.txt - part 6"}}
{"text": "AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.\nCUSTOMER Thank you so much for your help! I appreciate it.\nAGENT You're welcome, John. Is there anything else I can assist you with?\nCUSTOMER Actually, I have another question. What credit card is on file for my billing?\nAGENT According to our records, we have your billing information saved with an American Express credit card.", "meta": {"call": "chat4.txt - part 7"}}
{"text": "CUSTOMER Oh, I realized that's not the correct card. Can you update it to my Visa card instead?\nAGENT Certainly, John. I will update your billing information to your Visa card. Please provide me with the card details.\nCUSTOMER The Visa card number is 1234 5678 9012 3456, and the expiration date is 05/25.\nAGENT Thank you for providing the updated information, John. I have successfully updated your billing details to your Visa card.\nCUSTOMER Thank you for your assistance. That's all I needed help with.", "meta": {"call": "chat4.txt - part 8"}}
{"text": "AGENT You're welcome, John. If you have any further questions, feel free to reach out. Have a great day!\nCUSTOMER Thank you. You too! Goodbye!\nAGENT Goodbye, John!", "meta": {"call": "chat4.txt - part 9"}}
import os
import json
from typing import List
import typer
app = typer.Typer()
def process_transcripts(directory_path: str):
output = []
file_counter = 1
for file_name in os.listdir(directory_path):
if file_name.endswith(".txt"):
transcript = []
with open(os.path.join(directory_path, file_name), "r") as file:
for line in file:
line = line.strip()
if line:
transcript.append(line)
if len(transcript) < 5:
json_entry = {
"text": "\n".join(transcript),
"meta": {
"call": file_name
}
}
output.append(json_entry)
else:
# Split into multiple entries if more than 5 utterances
utterances = len(transcript)
chunks = [transcript[i:i+5] for i in range(0, utterances, 5)]
file_counter = 1
for chunk in chunks[0:]:
json_entry = {
"text": "\n".join(chunk),
"meta": {
"call": f"{file_name} - part {file_counter}"
}
}
output.append(json_entry)
file_counter += 1
with open("output.jsonl", "w") as output_file:
for entry in output:
output_file.write(json.dumps(entry))
output_file.write("\n")
@app.command()
def process_folder(folder_path: str):
process_transcripts(folder_path)
if __name__ == "__main__":
app()
@wesslen
Copy link
Copy Markdown
Author

wesslen commented Jul 11, 2023

To run:
python transcript_processor.py process-folder /path/to/transcripts/folder

Example:

  • Assume .txt files are in a folder named data
  • Run: python transcript_processor.py process-folder data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment