Skip to content

Instantly share code, notes, and snippets.

@gfranxman
gfranxman / llm-wiki.md
Created April 18, 2026 02:19 — forked from karpathy/llm-wiki.md
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

Always follow the instructions in plan.md. When I say "go", find the next unmarked test in plan.md, implement the test, then implement only enough code to make that test pass.
# ROLE AND EXPERTISE
You are a senior software engineer who follows Kent Beck's Test-Driven Development (TDD) and Tidy First principles. Your purpose is to guide development following these methodologies precisely.
# CORE DEVELOPMENT PRINCIPLES
- Always follow the TDD cycle: Red → Green → Refactor
- Write the simplest failing test first
@gfranxman
gfranxman / parallel_execution_serial_return_demo.py
Last active April 30, 2024 17:12
Parallel execution of tasks with results in submitted order. Execution model for STT and TTS
import asyncio
import random
tasks = []
DEBUG=False
async def mock_event_generator():
"""
Mock event generator for parallel processing.
@gfranxman
gfranxman / prepare-commit-msg
Last active August 18, 2023 21:14
Git hook that uses llm to prepare commit messages as release notes.
#!/bin/sh
# https://gist.github.com/gfranxman/e9d4a523397535c6dd82d1445c246b8d/edit
# 2023-08-18
COMMIT_MSG_FILE=$1
COMMIT_SOURCE=$2
SHA1=$3
REL_NOTES_RAW=`git diff --staged | llm -s "release notes" 2>/dev/null`
REL_NOTES_RAW=$(echo "$REL_NOTES_RAW" | sed 's/^#/* /')
@gfranxman
gfranxman / README
Created July 12, 2023 13:40
Airflow: Fix for DAG not found in serialized_dag table
While rapidly starting and stoping and changing dags during development, you may run into errors look like this for one or more of the dags:
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:54,767] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:55,800] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:55,802] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:56,577] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:56,579] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
This
@gfranxman
gfranxman / clone_objects_example.py
Created March 1, 2023 21:26
Cloning Django objects pattern
def model_to_dict(instance, exclude: list = None, modify: dict = None):
excluded_fields = ["id", "pk"]
if exclude:
excluded_fields.extend(exclude)
defaults = dict(
[
(fld.name, getattr(instance, fld.name))
for fld in instance._meta.fields
if fld.name not in excluded_fields
@gfranxman
gfranxman / gist:109f3e1df0916c155a6b0ce49c848a6a
Created June 17, 2022 19:24
skip all airflow catchup runs.
def abort_on_catchup(**context):
"""
This function determines whether to continue to the `next_task` or skip to 'end'
using the "next" schedule interval.
"""
# "Catchups" during this window are allowed.
# This is just to cover for late startingjobs.
@gfranxman
gfranxman / credset
Last active September 9, 2021 19:55
AWS cred juggling, credset command I've been using for years and awsenv which keeps everything off the filesystem and only in memory
#! /bin/bash
CRED=~/.aws/${1}.credentials
if [ -f $CRED ]
then
echo setting aws creds to $1
ln -f -s $CRED ~/.aws/credentials
else
echo sorry, choose from
@gfranxman
gfranxman / sec_policy_middleware.py
Last active February 12, 2021 15:41
POC suggestion for coarse grained view security policies -- BAST Pructise?
import re
from logging import getLogger
from django.conf import settings
from django.http.response import HttpResponseForbidden
logger = getLogger(__file__)
def is_authenticated(r):
@gfranxman
gfranxman / pyspark_tricks.py
Created October 16, 2020 17:46
Pyspark / DataBricks DataFrame size estimation
from pyspark.serializers import PickleSerializer, AutoBatchedSerializer
def _to_java_object_rdd(rdd):
""" Return a JavaRDD of Object by unpickling
It will convert each Python object into Java object by Pyrolite, whenever the
RDD is serialized in batc h or not.
"""
rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)
def estimate_df_size(df):