Skip to content

Instantly share code, notes, and snippets.

@bytearchive
bytearchive / llm-wiki.md
Created April 10, 2026 01:58 — forked from karpathy/llm-wiki.md
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

@bytearchive
bytearchive / squid-deb-proxy_on_docker.md
Created February 3, 2020 04:56 — forked from dergachev/squid-deb-proxy_on_docker.md
Caching debian package installation with docker

TLDR: I now add the following snippet to all my Dockerfiles:

# If host is running squid-deb-proxy on port 8000, populate /etc/apt/apt.conf.d/30proxy
# By default, squid-deb-proxy 403s unknown sources, so apt shouldn't proxy ppa.launchpad.net
RUN route -n | awk '/^0.0.0.0/ {print $2}' > /tmp/host_ip.txt
RUN echo "HEAD /" | nc `cat /tmp/host_ip.txt` 8000 | grep squid-deb-proxy \
  && (echo "Acquire::http::Proxy \"http://$(cat /tmp/host_ip.txt):8000\";" > /etc/apt/apt.conf.d/30proxy) \
  && (echo "Acquire::http::Proxy::ppa.launchpad.net DIRECT;" >> /etc/apt/apt.conf.d/30proxy) \
  || echo "No squid-deb-proxy detected on docker host"
@bytearchive
bytearchive / cheatsheet-elasticsearch.md
Created September 30, 2019 08:31 — forked from ruanbekker/cheatsheet-elasticsearch.md
Elasticsearch Cheatsheet : Example API usage of using Elasticsearch with curl
@bytearchive
bytearchive / gist:e0fdc8f1445a86b989ed9b271f055150
Created July 27, 2019 01:38 — forked from eliben/gist:5797351
Generic regex-based lexer in Python
#-------------------------------------------------------------------------------
# lexer.py
#
# A generic regex-based Lexer/tokenizer tool.
# See the if __main__ section in the bottom for an example.
#
# Eli Bendersky ([email protected])
# This code is in the public domain
# Last modified: August 2010
#-------------------------------------------------------------------------------

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

@bytearchive
bytearchive / .bashrc
Created April 1, 2018 19:30 — forked from vsouza/.bashrc
Golang setup in Mac OSX with HomeBrew. Set `GOPATH` and `GOROOT` variables in zshell, fish or bash.
# Set variables in .bashrc file
# don't forget to change your path correctly!
export GOPATH=$HOME/golang
export GOROOT=/usr/local/opt/go/libexec
export PATH=$PATH:$GOPATH/bin
export PATH=$PATH:$GOROOT/bin
@bytearchive
bytearchive / amazonctl.py
Created February 28, 2018 20:59 — forked from jtpaasch/amazonctl.py
A collection of functions commonly used to do AWS stuff.
# -*- coding: utf-8 -*-
"""A simple tool to document how to control AWS resources.
AWS AUTHENTICATION
-------------------
In order to run any of the code below, you need a profile with AWS credentials
set up on your computer. It's very easy to do this. Google how to configure
your profile with boto3, or visit the docs:
@bytearchive
bytearchive / amazonctl.py
Created February 28, 2018 20:59 — forked from jtpaasch/amazonctl.py
A collection of functions commonly used to do AWS stuff.
# -*- coding: utf-8 -*-
"""A simple tool to document how to control AWS resources.
AWS AUTHENTICATION
-------------------
In order to run any of the code below, you need a profile with AWS credentials
set up on your computer. It's very easy to do this. Google how to configure
your profile with boto3, or visit the docs:
@bytearchive
bytearchive / db_bind_sharding.py
Created September 22, 2017 07:17 — forked from ziplus4/db_bind_sharding.py
flask, sqlalchemy sample : sharding
# -*- coding:utf8 -*-
import re
from flask import Flask
from flask_sqlalchemy import SQLAlchemy as BaseSQLAlchemy
from flask_sqlalchemy import _SignallingSession as BaseSignallingSession
from flask_sqlalchemy import orm, partial, get_state
from datetime import datetime