-
-
Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
# Save the transcripts using the "Save Page WE" Chrome Extension | |
# This script was generated by ChatGPT | |
import sys | |
from bs4 import BeautifulSoup | |
# Check if a file was provided as a command line argument | |
if len(sys.argv) < 2: | |
print("Please provide an HTML file as a command line argument.") | |
sys.exit(1) | |
# Read the HTML file | |
html_file = sys.argv[1] | |
with open(html_file, 'r') as f: | |
html = f.read() | |
# Parse the HTML using BeautifulSoup | |
soup = BeautifulSoup(html, 'html.parser') | |
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class | |
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-')) | |
# Output the conversation as a Markdown quote | |
for i, element in enumerate(conversation_elements): | |
text = element.get_text() | |
lines = text.split('\n') | |
if i % 2 == 0: | |
speaker = "User" | |
else: | |
speaker = "Assistant" | |
first_line = True | |
for line in lines: | |
if first_line: | |
print(speaker) | |
first_line = False | |
print(f"> {line}") | |
print() |
This was useful back when OpenAI didn't let us save the chat history. So I used to save the page as HTML and then run it through this script to convert it. Now it is a lot easier and there are many chrome extensions that do a better job.
Edit
Nevermind my original message below, the extension "Chat GPT Prompt Genius" does a good job of this by adding a "Share & Export" button to the sidebar of conversations in ChatGPT. It's available in Chrome and Firefox.
Original message
Here's a hacky attempt to update this for the latest HTML output by running "Save Page WE" on ChatGPT conversations. The `'ConversationItem__ConversationItemWrapper-'` class no longer shows up in the HTML output, so this snippet no longer works.
```Python
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-'))
I found the common element to be the following, which extracts conversations, but further processing would be necessary to Markdown-ify them, like code blocks and such.
# Find all the elements corresponding to a message in the conversation
conversation_elements = soup.find_all(
class_=lambda c: c
and c.startswith("min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap")
)
Here's the updated implementation that worked at the time of this comment.
```Python # Source: https://gist.github.com/thomasantony/c2d866d1cb3fec3c532b13ce695c9438Save the transcripts using the "Save Page WE" Chrome Extension
This script was generated by ChatGPT
import sys
from bs4 import BeautifulSoup
Check if a file was provided as a command line argument
if len(sys.argv) < 2:
print("Please provide an HTML file as a command line argument.")
sys.exit(1)
Read the HTML file
html_file = sys.argv[1]
with open(html_file, "r") as f:
html = f.read()
Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
Find all the elements corresponding to a message in the conversation
conversation_elements = soup.find_all(
class_=lambda c: c
and c.startswith("min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap")
)
Output the conversation as a Markdown quote
for i, element in enumerate(conversation_elements):
text = element.get_text()
lines = text.split("\n")
speaker = "User" if i % 2 == 0 else "Assistant"
first_line = True
for line in lines:
if first_line:
print(speaker)
first_line = False
print(f"> {line}")
print()
</details>
Parse the HTML using BeautifulSoup
https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/
To parse HTML using BeautifulSoup, you can use the BeautifulSoup() function. The syntax is:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_string, 'html.parser')
where:
html_string is the HTML string to be parsed
'html.parser' is the parser to use
Once you have created a BeautifulSoup object, you can use it to access the different elements of the HTML document. For example, to get the title of the document, you can use:
title = soup.title.string
To get all of the links in the document, you can use:
links = soup.find_all('a')
For more information, please see the BeautifulSoup documentation.
I just said "give me the Markdown code not the result" and it gave me what I wanted