Last active
January 11, 2024 13:21
-
-
Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Convert saved HTML transcripts from ChatGPT to Markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Save the transcripts using the "Save Page WE" Chrome Extension | |
# This script was generated by ChatGPT | |
import sys | |
from bs4 import BeautifulSoup | |
# Check if a file was provided as a command line argument | |
if len(sys.argv) < 2: | |
print("Please provide an HTML file as a command line argument.") | |
sys.exit(1) | |
# Read the HTML file | |
html_file = sys.argv[1] | |
with open(html_file, 'r') as f: | |
html = f.read() | |
# Parse the HTML using BeautifulSoup | |
soup = BeautifulSoup(html, 'html.parser') | |
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class | |
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-')) | |
# Output the conversation as a Markdown quote | |
for i, element in enumerate(conversation_elements): | |
text = element.get_text() | |
lines = text.split('\n') | |
if i % 2 == 0: | |
speaker = "User" | |
else: | |
speaker = "Assistant" | |
first_line = True | |
for line in lines: | |
if first_line: | |
print(speaker) | |
first_line = False | |
print(f"> {line}") | |
print() |
Parse the HTML using BeautifulSoup
https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/
To parse HTML using BeautifulSoup, you can use the BeautifulSoup() function. The syntax is:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_string, 'html.parser')
where:
html_string is the HTML string to be parsed
'html.parser' is the parser to use
Once you have created a BeautifulSoup object, you can use it to access the different elements of the HTML document. For example, to get the title of the document, you can use:
title = soup.title.string
To get all of the links in the document, you can use:
links = soup.find_all('a')
For more information, please see the BeautifulSoup documentation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Edit
Nevermind my original message below, the extension "Chat GPT Prompt Genius" does a good job of this by adding a "Share & Export" button to the sidebar of conversations in ChatGPT. It's available in Chrome and Firefox.
Original message
I found the common element to be the following, which extracts conversations, but further processing would be necessary to Markdown-ify them, like code blocks and such.
Here's the updated implementation that worked at the time of this comment.
```Python # Source: https://gist.github.com/thomasantony/c2d866d1cb3fec3c532b13ce695c9438Save the transcripts using the "Save Page WE" Chrome Extension
This script was generated by ChatGPT
import sys
from bs4 import BeautifulSoup
Check if a file was provided as a command line argument
if len(sys.argv) < 2:
print("Please provide an HTML file as a command line argument.")
sys.exit(1)
Read the HTML file
html_file = sys.argv[1]
with open(html_file, "r") as f:
html = f.read()
Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
Find all the elements corresponding to a message in the conversation
conversation_elements = soup.find_all(
class_=lambda c: c
and c.startswith("min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap")
)
Output the conversation as a Markdown quote
for i, element in enumerate(conversation_elements):
text = element.get_text()
lines = text.split("\n")
speaker = "User" if i % 2 == 0 else "Assistant"
first_line = True
for line in lines:
if first_line:
print(speaker)
first_line = False
print(f"> {line}")
print()