Skip to content

Instantly share code, notes, and snippets.

@wassname
Created December 26, 2024 00:46
Show Gist options
  • Save wassname/8b0562f1a39f4e4353266a7bd956d0df to your computer and use it in GitHub Desktop.
Save wassname/8b0562f1a39f4e4353266a7bd956d0df to your computer and use it in GitHub Desktop.
Format a reddit thread into markdown suitable for an llm
# from https://github.dev/JosefAlbers/rd2md
import textwrap
from datetime import datetime
def format_flair(obj):
if obj.author_flair_text is not None:
return f" *{obj.author_flair_text}*"
return ""
import humanize
def format_comment(comment, depth=0, upvote_threshold=2, t0=None):
if comment.score < upvote_threshold:
return ""
ts = pd.to_datetime(comment.created, unit='s')
if t0 is not None:
delta = ts - t0
delta = humanize.naturaldelta(delta.total_seconds() )
else:
delta = ""
indent = ">" * (depth+1) + " "
author_line = f"{indent}**u/{comment.author}** [{comment.score:+}] {format_flair(comment)} ({delta} later)\n"
dedented_body = textwrap.dedent(comment.body)
# use re to replace multiple newlines with a single newline
indented_body = textwrap.indent(dedented_body, indent)
indented_body = re.sub(r'\n\n+', f'\n{indent}\n', indented_body.strip())
comment_block = f"{indent}\n{indented_body}\n\n"
formatted = author_line + comment_block
for reply in comment.replies:
formatted += format_comment(reply, depth + 1, upvote_threshold, t0)
return formatted
def submission_to_markdown(post, comment_score_threshold=0, verbose=False):
post_content = []
post_content.append(f"## {post.title}\n\n")
if verbose:
ts = pd.to_datetime(post.created, unit='s')
post_content.append(f"* Author: u/{post.author} {format_flair(post)}*\n")
post_content.append(f"* URL: {post.url}\n")
post_content.append(f"* Score: {post.score}\n\n")
post_content.append(f"* Created: {ts.isoformat()}\n\n")
post_content.append("### Post:\n\n")
if post.is_self:
content = post.selftext
content = content.replace('\n#', '\n####') # md headings
post_content.append(f"{content}\n\n")
else:
post_content.append(f"[Link to content]({post.url})\n\n")
post_content.append("### Comments:\n\n")
post.comments.replace_more(limit=None)
for comment in post.comments:
post_content.append(format_comment(comment, upvote_threshold=comment_score_threshold, t0=ts))
post_content.append("---\n\n")
return ''.join(post_content)
submissions += list(sub.search("Monday Request and Recommendation Thread"))
for submission in tqdm(submissions):
md = submission_to_markdown(submission, -100, verbose=True)
@wassname
Copy link
Author

Example output

[RT] [FF] [WIP] The Mind-Tailed Fox

Post:

Link to content

Comments:

u/Charlie___ [+6] (20 hours later)

Have only read the first chapter so far, but I quite like the writing. I think this is a really elegant way of doing the brains-for-chakra trade-off.

One plot problem I'm worried about is that if the kyuubi really is released if Naruto is assassinated, it will try to make that happen. And, being smart, it will either succeed or I'm worried there's a plot hole.

Instead, maybe there's some better conditions of the binding - like maybe the kyuubi is only released a few percent of the time if Naruto is assassinated, but is always released if Naruto dies of old age (not that it would tell anyone else that fact).

u/arenavanera [+3] (23 hours later)

Thanks!

I'm also worried about plot problems caused by the Kyuubi being too smart. You'd expect a superintelligence that's trying to trick a small child into getting himself killed by people he knows to succeed almost immediately. There are some complications -- Naruto is guarded, anyone powerful enough to defeat his guards is more likely to be smart and also to know about the Kyuubi, if the Kyuubi tips his hand too early and the Hokage knows he's actively trying to escape it would get much harder, etc. etc. -- but I eventually decided that I did need to tone the Kyuubi's power level down a bit to have the conflict be plausible.

A structural problem related to that is that it's hard to write a superintelligence that's weakened in several crucial ways but doesn't want to share what they are, because if you just write that directly it comes across as a superintelligence that's acting irrationally or inconsistently. I'm planning for Naruto to discover more about the details once he understands how seals work, but if I were to go back and write it again I would put the excuse for him learning the actual rules the Kyuubi is operating under much much earlier in the story so that it's clearer what's going on.

(Another related problem I've had trouble with is that it's hard to write good dialogue between Naruto and the Kyuubi, because good dialogue usually involves conflicting personalities, and it's hard to see why the Kyuubi wouldn't just interact with Naruto in whatever way made Naruto happiest and most pliable. You can handwave some of this away because human psychology is bonkers and it might actually be correct for the Kyuubi to occasionally create conflict with Naruto for manipulative reasons, but I'm still struggling with it.)

u/Sailor_Vulcan [+6] Champion of Justice and Reason (a day later)

Why not just go with that? Kyubi wants Naruto's trust for some reason, and is trying to manipulate him. As for why Kyubi doesn't just try to get naruto to kill himself to release him, introduce costs to the kyubi for release. Like, he gains some of the knowledge and skills of his hosts while he's inhabiting them so he doesn't necessarily want to be released right away. And if they die too suddenly or violently he'll go into an irrational frenzied state and start mindlessly rampaging after being forcibly ripped from the mind of someone who was dying horribly. Instead of making him a superintelligence just give him the power to steal the knowledge and skills of his hosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment