Last active
July 11, 2023 22:34
-
-
Save wesslen/22f9eaaaccf44515668c3cf9705399f8 to your computer and use it in GitHub Desktop.
Python script that processes txt transcript files and outputs .jsonl for Prodigy annotations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| AGENT hello this is steve can i ask what's your name | |
| CUSTOMER my name is harry | |
| AGENT thanks harry. how can i help you |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| AGENT hello this is john. how may i assist you | |
| CUSTOMER i have a problem with my order | |
| AGENT i'm sorry to hear that. please provide me your order details | |
| CUSTOMER my order number is 12345 | |
| AGENT thank you. let me check that for you | |
| CUSTOMER alright, thank you | |
| AGENT you're welcome. please wait a moment | |
| AGENT i have checked your order. it will be delivered tomorrow | |
| CUSTOMER that's great. thank you for your help | |
| AGENT you're welcome. is there anything else i can assist you with? | |
| CUSTOMER no, that's all. have a great day | |
| AGENT thank you, you too. goodbye |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| AGENT Welcome to our support line. How can I assist you today? | |
| CUSTOMER Hi, I'm having trouble accessing my account. | |
| AGENT I'm sorry to hear that. Can you please provide me with your username? | |
| CUSTOMER My username is johndoe123. | |
| AGENT Thank you, John. Let me check your account. | |
| AGENT It seems that your account is temporarily locked for security reasons. Have you tried resetting your password? | |
| CUSTOMER No, I haven't. How can I reset my password? | |
| AGENT I will guide you through the password reset process. Please bear with me. | |
| AGENT First, visit our website and click on the "Forgot Password" link. | |
| CUSTOMER Okay, I'm on the website. I found the "Forgot Password" link. | |
| AGENT Great. Click on that link, and it will take you to the password reset page. | |
| CUSTOMER Done. I'm on the password reset page now. | |
| AGENT On the password reset page, enter your email address associated with your account and click "Submit." | |
| CUSTOMER I entered my email and clicked "Submit." | |
| AGENT Perfect. Now, check your email inbox for a password reset link. | |
| CUSTOMER I received the email. It says there's a link to reset my password. | |
| AGENT Click on that link in the email. It will redirect you to a page where you can enter your new password. | |
| CUSTOMER I clicked the link, and it opened a new page to enter a new password. | |
| AGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click "Save" or "Reset Password." | |
| CUSTOMER I entered my new password and clicked "Save." | |
| AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account. | |
| CUSTOMER Thank you so much for your help! I appreciate it. | |
| AGENT You're welcome, John. Is there anything else I can assist you with? | |
| CUSTOMER No, that's all. Have a great day! | |
| AGENT Thank you, John. Have a great day too. Goodbye! | |
| CUSTOMER Goodbye! | |
| AGENT Goodbye! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| AGENT Welcome to our support line. How can I assist you today? | |
| CUSTOMER Hi, I'm having trouble accessing my account. | |
| AGENT I'm sorry to hear that. Can you please provide me with your username? | |
| CUSTOMER I'm sorry, but I can't remember my username either. | |
| AGENT Not a problem. Let me guide you through the process of resetting both your username and password. | |
| AGENT To reset your username, please visit our website and click on the "Forgot Username" link. | |
| CUSTOMER Okay, I'm on the website. I found the "Forgot Username" link. | |
| AGENT Great. Click on that link, and it will take you to the username recovery page. | |
| CUSTOMER Done. I'm on the username recovery page now. | |
| AGENT On the username recovery page, enter your email address associated with your account and click "Submit." | |
| CUSTOMER I entered my email and clicked "Submit." | |
| AGENT Perfect. We've sent an email to your registered email address with instructions on how to recover your username. | |
| CUSTOMER Alright, I received the email. It says there's a link to recover my username. | |
| AGENT Click on that link in the email. It will redirect you to a page where you can recover your username. | |
| CUSTOMER I clicked the link, and it opened a new page to recover my username. | |
| AGENT Excellent. Follow the instructions on that page to recover your username. | |
| CUSTOMER I successfully recovered my username. It's johnsmith123. | |
| AGENT That's great news, John! Now that you have your username, let's proceed to reset your password. | |
| AGENT To reset your password, please visit our website again and click on the "Forgot Password" link. | |
| CUSTOMER Alright, I'm on the website again. I found the "Forgot Password" link. | |
| AGENT Great. Click on that link, and it will take you to the password reset page. | |
| CUSTOMER Done. I'm on the password reset page now. | |
| AGENT On the password reset page, enter your username and click "Submit." | |
| CUSTOMER I entered my username and clicked "Submit." | |
| AGENT Perfect. Now, check your email inbox for a password reset link. | |
| CUSTOMER I received the email. It says there's a link to reset my password. | |
| AGENT Click on that link in the email. It will redirect you to a page where you can enter your new password. | |
| CUSTOMER I clicked the link, and it opened a new page to enter a new password. | |
| AGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click "Save" or "Reset Password." | |
| CUSTOMER I entered my new password and clicked "Save." | |
| AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account. | |
| CUSTOMER Thank you so much for your help! I appreciate it. | |
| AGENT You're welcome, John. Is there anything else I can assist you with? | |
| CUSTOMER Actually, I have another question. What credit card is on file for my billing? | |
| AGENT According to our records, we have your billing information saved with an American Express credit card. | |
| CUSTOMER Oh, I realized that's not the correct card. Can you update it to my Visa card instead? | |
| AGENT Certainly, John. I will update your billing information to your Visa card. Please provide me with the card details. | |
| CUSTOMER The Visa card number is 1234 5678 9012 3456, and the expiration date is 05/25. | |
| AGENT Thank you for providing the updated information, John. I have successfully updated your billing details to your Visa card. | |
| CUSTOMER Thank you for your assistance. That's all I needed help with. | |
| AGENT You're welcome, John. If you have any further questions, feel free to reach out. Have a great day! | |
| CUSTOMER Thank you. You too! Goodbye! | |
| AGENT Goodbye, John! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| {"text": "AGENT hello this is steve can i ask what's your name\nCUSTOMER my name is harry\nAGENT thanks harry. how can i help you", "meta": {"call": "chat1.txt"}} | |
| {"text": "AGENT Welcome to our support line. How can I assist you today?\nCUSTOMER Hi, I'm having trouble accessing my account.\nAGENT I'm sorry to hear that. Can you please provide me with your username?\nCUSTOMER My username is johndoe123.\nAGENT Thank you, John. Let me check your account.", "meta": {"call": "chat3.txt - part 1"}} | |
| {"text": "AGENT It seems that your account is temporarily locked for security reasons. Have you tried resetting your password?\nCUSTOMER No, I haven't. How can I reset my password?\nAGENT I will guide you through the password reset process. Please bear with me.\nAGENT First, visit our website and click on the \"Forgot Password\" link.\nCUSTOMER Okay, I'm on the website. I found the \"Forgot Password\" link.", "meta": {"call": "chat3.txt - part 2"}} | |
| {"text": "AGENT Great. Click on that link, and it will take you to the password reset page.\nCUSTOMER Done. I'm on the password reset page now.\nAGENT On the password reset page, enter your email address associated with your account and click \"Submit.\"\nCUSTOMER I entered my email and clicked \"Submit.\"\nAGENT Perfect. Now, check your email inbox for a password reset link.", "meta": {"call": "chat3.txt - part 3"}} | |
| {"text": "CUSTOMER I received the email. It says there's a link to reset my password.\nAGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.\nCUSTOMER I clicked the link, and it opened a new page to enter a new password.\nAGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click \"Save\" or \"Reset Password.\"\nCUSTOMER I entered my new password and clicked \"Save.\"", "meta": {"call": "chat3.txt - part 4"}} | |
| {"text": "AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.\nCUSTOMER Thank you so much for your help! I appreciate it.\nAGENT You're welcome, John. Is there anything else I can assist you with?\nCUSTOMER No, that's all. Have a great day!\nAGENT Thank you, John. Have a great day too. Goodbye!", "meta": {"call": "chat3.txt - part 5"}} | |
| {"text": "CUSTOMER Goodbye!\nAGENT Goodbye!", "meta": {"call": "chat3.txt - part 6"}} | |
| {"text": "AGENT hello this is john. how may i assist you\nCUSTOMER i have a problem with my order\nAGENT i'm sorry to hear that. please provide me your order details\nCUSTOMER my order number is 12345\nAGENT thank you. let me check that for you", "meta": {"call": "chat2.txt - part 1"}} | |
| {"text": "CUSTOMER alright, thank you\nAGENT you're welcome. please wait a moment\nAGENT i have checked your order. it will be delivered tomorrow\nCUSTOMER that's great. thank you for your help\nAGENT you're welcome. is there anything else i can assist you with?", "meta": {"call": "chat2.txt - part 2"}} | |
| {"text": "CUSTOMER no, that's all. have a great day\nAGENT thank you, you too. goodbye", "meta": {"call": "chat2.txt - part 3"}} | |
| {"text": "AGENT Welcome to our support line. How can I assist you today?\nCUSTOMER Hi, I'm having trouble accessing my account.\nAGENT I'm sorry to hear that. Can you please provide me with your username?\nCUSTOMER I'm sorry, but I can't remember my username either.\nAGENT Not a problem. Let me guide you through the process of resetting both your username and password.", "meta": {"call": "chat4.txt - part 1"}} | |
| {"text": "AGENT To reset your username, please visit our website and click on the \"Forgot Username\" link.\nCUSTOMER Okay, I'm on the website. I found the \"Forgot Username\" link.\nAGENT Great. Click on that link, and it will take you to the username recovery page.\nCUSTOMER Done. I'm on the username recovery page now.\nAGENT On the username recovery page, enter your email address associated with your account and click \"Submit.\"", "meta": {"call": "chat4.txt - part 2"}} | |
| {"text": "CUSTOMER I entered my email and clicked \"Submit.\"\nAGENT Perfect. We've sent an email to your registered email address with instructions on how to recover your username.\nCUSTOMER Alright, I received the email. It says there's a link to recover my username.\nAGENT Click on that link in the email. It will redirect you to a page where you can recover your username.\nCUSTOMER I clicked the link, and it opened a new page to recover my username.", "meta": {"call": "chat4.txt - part 3"}} | |
| {"text": "AGENT Excellent. Follow the instructions on that page to recover your username.\nCUSTOMER I successfully recovered my username. It's johnsmith123.\nAGENT That's great news, John! Now that you have your username, let's proceed to reset your password.\nAGENT To reset your password, please visit our website again and click on the \"Forgot Password\" link.\nCUSTOMER Alright, I'm on the website again. I found the \"Forgot Password\" link.", "meta": {"call": "chat4.txt - part 4"}} | |
| {"text": "AGENT Great. Click on that link, and it will take you to the password reset page.\nCUSTOMER Done. I'm on the password reset page now.\nAGENT On the password reset page, enter your username and click \"Submit.\"\nCUSTOMER I entered my username and clicked \"Submit.\"\nAGENT Perfect. Now, check your email inbox for a password reset link.", "meta": {"call": "chat4.txt - part 5"}} | |
| {"text": "CUSTOMER I received the email. It says there's a link to reset my password.\nAGENT Click on that link in the email. It will redirect you to a page where you can enter your new password.\nCUSTOMER I clicked the link, and it opened a new page to enter a new password.\nAGENT Excellent. Choose a strong, unique password and enter it in the provided fields. Then click \"Save\" or \"Reset Password.\"\nCUSTOMER I entered my new password and clicked \"Save.\"", "meta": {"call": "chat4.txt - part 6"}} | |
| {"text": "AGENT Wonderful. Your password has been successfully reset. You should now be able to access your account.\nCUSTOMER Thank you so much for your help! I appreciate it.\nAGENT You're welcome, John. Is there anything else I can assist you with?\nCUSTOMER Actually, I have another question. What credit card is on file for my billing?\nAGENT According to our records, we have your billing information saved with an American Express credit card.", "meta": {"call": "chat4.txt - part 7"}} | |
| {"text": "CUSTOMER Oh, I realized that's not the correct card. Can you update it to my Visa card instead?\nAGENT Certainly, John. I will update your billing information to your Visa card. Please provide me with the card details.\nCUSTOMER The Visa card number is 1234 5678 9012 3456, and the expiration date is 05/25.\nAGENT Thank you for providing the updated information, John. I have successfully updated your billing details to your Visa card.\nCUSTOMER Thank you for your assistance. That's all I needed help with.", "meta": {"call": "chat4.txt - part 8"}} | |
| {"text": "AGENT You're welcome, John. If you have any further questions, feel free to reach out. Have a great day!\nCUSTOMER Thank you. You too! Goodbye!\nAGENT Goodbye, John!", "meta": {"call": "chat4.txt - part 9"}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import json | |
| from typing import List | |
| import typer | |
| app = typer.Typer() | |
| def process_transcripts(directory_path: str): | |
| output = [] | |
| file_counter = 1 | |
| for file_name in os.listdir(directory_path): | |
| if file_name.endswith(".txt"): | |
| transcript = [] | |
| with open(os.path.join(directory_path, file_name), "r") as file: | |
| for line in file: | |
| line = line.strip() | |
| if line: | |
| transcript.append(line) | |
| if len(transcript) < 5: | |
| json_entry = { | |
| "text": "\n".join(transcript), | |
| "meta": { | |
| "call": file_name | |
| } | |
| } | |
| output.append(json_entry) | |
| else: | |
| # Split into multiple entries if more than 5 utterances | |
| utterances = len(transcript) | |
| chunks = [transcript[i:i+5] for i in range(0, utterances, 5)] | |
| file_counter = 1 | |
| for chunk in chunks[0:]: | |
| json_entry = { | |
| "text": "\n".join(chunk), | |
| "meta": { | |
| "call": f"{file_name} - part {file_counter}" | |
| } | |
| } | |
| output.append(json_entry) | |
| file_counter += 1 | |
| with open("output.jsonl", "w") as output_file: | |
| for entry in output: | |
| output_file.write(json.dumps(entry)) | |
| output_file.write("\n") | |
| @app.command() | |
| def process_folder(folder_path: str): | |
| process_transcripts(folder_path) | |
| if __name__ == "__main__": | |
| app() | |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To run:
python transcript_processor.py process-folder /path/to/transcripts/folderExample:
.txtfiles are in a folder nameddatapython transcript_processor.py process-folder data