Skip to content

Instantly share code, notes, and snippets.

@Uzay-G
Last active January 7, 2024 17:23
Show Gist options
  • Save Uzay-G/a1b437182d01e0dbf490ff1e253ecb3b to your computer and use it in GitHub Desktop.
Save Uzay-G/a1b437182d01e0dbf490ff1e253ecb3b to your computer and use it in GitHub Desktop.
Auto generate Anki flashcards based on a directory of markdown
from pathlib import Path
import openai
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter
import yaml
# program that reads MD files, calls OpenAI API with GPT 4 to generate flashcards for a knowledge base ANKI deck
config = yaml.safe_load(open("ankify.yml"))
directory = Path(config["base_dir"])
encoding = tiktoken.encoding_for_model("gpt-4")
openai.api_key = < YOUR OPENAI KEY >
# read in the file
gen_flashcard_prompt = lambda chunk: [
{
"role": "system",
"content": """You are generating Anki flashcards for a user to retain knowledge from their knowledge base. The user text will be a part of their knowledge base notes, and you need to take key concepts and questions for them to remember. Be selective and only take important information. Be concise and clear.
Instructions to create a deck of flashcards:
- Keep the flashcards simple, clear, and focused on the most important information.
- Use simple and direct language to make the cards easy to read and understand.
- don't include obvious knowledge or irrelevant formulas. Focus on recalling conceptual links or ideas, and put in [] random comments from the notes that seem to enhance the flashcard idea, so that I can see more info when I do my flashcards
- Focus on intuitions and non trivial ideas rather than just facts or simple definitions. this is important.
Example text:
notion of work: how much effort had to be given what we know about the configuration, to create it
for electrostatics, this means taking every pair of units of charge and considering how much the electrostatic force pushes back against them being where they are, from infinity
+- force along path from infinity
want to develop better intuition on the relationship between work and energy #todo
electric potential of a charge can be viewed as how the introduction of the charge changes the total work needed to create the system
Total Energy = PE + KE
can obtain the same with unit energy density of e0 / 2 * |E|^2
being curl free amounts to being 0 around closed loop, how does this relate to the intuition of curl as measuring rotation, and allows us to create potential, #todo
mechanism/framing of electric potential gives us a simpler primitiive to get the E-field and study other prpoperties of the electric config with
in conducting materials charge elements move freely and will rearrange to minimize PE
think more about this minimization principle #todo
linked to work: the charges slide naturally towards low-energy minima
Earnshaw's theorem: it's impossible to build a stable config w/ only coulomb forces #todo ask about this
uniqueness theorem:
fixing the E-field or net charge on each conductor for a set of conductors is enough to determine E-field uniquely
The format of the flashcards is `prompt; "answer"` for each new line.
Flashcards (keep in mind format):
work in E&M; "How much effort to create a given configuration, in electrostatics, this means taking into account repulsive forces
electric potential of a charge; "how much introduction of charge changes total work needed to create system"
curl free; "being 0 around closed loop, [relate to intuition of curl as measuring rotation]"
why do we use electric potential rather than E-field; "simpler primitive to get E-field and study other properties of electric config with"
minimization principle in conducting materials; "charges slide naturally towards low-energy minima, creating elegant and useful properties"
Earnshaw's theorem; "impossible to build a stable config w/ only coulomb forces"
Actual text:
""",
},
{"role": "user", "content": chunk + "\n\nFlashcards (keep in mind format):\n"},
]
splitter = RecursiveCharacterTextSplitter(
chunk_size=2000,
length_function=lambda t: len(encoding.encode(t)),
separators=["\n\n", "\n", " ", ""],
chunk_overlap=2,
)
# for each file, generate a flashcard
write_to = "ankify.txt"
for subdir in config["include"]:
for path in (directory / subdir).rglob("*.md"):
with open(path, "r") as f:
file = f.read()
# call GPT 4 Chat
chunks = splitter.split_text(file)
print("Processing " + str(path), f"{len(chunks)} chunks")
for chunk in chunks:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=gen_flashcard_prompt(chunk),
temperature=0.9,
)
with open(write_to, "a") as f:
text = response.choices[0].message.content
modified = "\n".join(
[line + f" Source: {path.name}" for line in text.split("\n")]
)
if (
not "isn't enough information" in text
and not "no relevant information" in text
):
print("Writing to " + str(path))
f.write(modified + "\n\n")
base_dir: directory with your markdown files
include: // sub directories you want to include, just put / if you want to include everything, and don't put any other list items
- dir1
- dir2
@Uzay-G
Copy link
Author

Uzay-G commented Aug 20, 2023

This is a very quick script I wrote to automatically transform my knowledge base into Anki flashcards.
Reads your config file in ankify.yml (from where you are running the script), operates on a directory of markdown files. Note that this can be expensive GPT-4 wise, depending on the size of your data, eg it cost me ~$15 to run this on like ~200k words. Note that you need to run pip install langchain openai tiktoken.

I did this 2 days ago and it's been nice/useful for me, getting flashcards from subjects like econ to mechanics to abstract algebra to phlosophy all in a row, and being able to actually engage with my notes on a deeper level. Let me know if you have comments/questions or find it useful here.

Once you run the script (monitor cost if you want), then your output will be in ankify.txt, and you can click import from file to add it to your desk.

@Uzay-G
Copy link
Author

Uzay-G commented Aug 20, 2023

Note that you need to set openai.api_key to your OA api key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment