Last active
January 11, 2024 13:21
-
-
Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Convert saved HTML transcripts from ChatGPT to Markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Save the transcripts using the "Save Page WE" Chrome Extension | |
# This script was generated by ChatGPT | |
import sys | |
from bs4 import BeautifulSoup | |
# Check if a file was provided as a command line argument | |
if len(sys.argv) < 2: | |
print("Please provide an HTML file as a command line argument.") | |
sys.exit(1) | |
# Read the HTML file | |
html_file = sys.argv[1] | |
with open(html_file, 'r') as f: | |
html = f.read() | |
# Parse the HTML using BeautifulSoup | |
soup = BeautifulSoup(html, 'html.parser') | |
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class | |
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-')) | |
# Output the conversation as a Markdown quote | |
for i, element in enumerate(conversation_elements): | |
text = element.get_text() | |
lines = text.split('\n') | |
if i % 2 == 0: | |
speaker = "User" | |
else: | |
speaker = "Assistant" | |
first_line = True | |
for line in lines: | |
if first_line: | |
print(speaker) | |
first_line = False | |
print(f"> {line}") | |
print() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Parse the HTML using BeautifulSoup
https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/
To parse HTML using BeautifulSoup, you can use the BeautifulSoup() function. The syntax is:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_string, 'html.parser')
where:
html_string is the HTML string to be parsed
'html.parser' is the parser to use
Once you have created a BeautifulSoup object, you can use it to access the different elements of the HTML document. For example, to get the title of the document, you can use:
title = soup.title.string
To get all of the links in the document, you can use:
links = soup.find_all('a')
For more information, please see the BeautifulSoup documentation.