Skip to content

Instantly share code, notes, and snippets.

View allanj's full-sized avatar
🎯
Focusing

Allan Jie allanj

🎯
Focusing
View GitHub Profile
@allanj
allanj / llm-wiki.md
Created May 5, 2026 00:28 — forked from karpathy/llm-wiki.md
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

@allanj
allanj / inspirational_quotes_app.py
Created July 23, 2025 16:03 — forked from CharlyWargnier/inspirational_quotes_app.py
A Streamlit app for navigating through inspirational quotes with "Next" and "Previous" buttons.
import streamlit as st
if 'count' not in st.session_state:
st.session_state.count = 0
if 'quotes' not in st.session_state:
st.session_state.quotes = [
"Life is what happens when you're busy making other plans. — John Lennon",
"Get busy living or get busy dying. — Stephen King",
"You only live once, but if you do it right, once is enough. — Mae West",
@allanj
allanj / prompt.py
Created February 18, 2025 10:14
python prompt for few-shot prompting
PYTHON_PROMPT = '''Question: after resting they decided to go for a swim . the depth of the water is 5 times ron 's height . dean is 11 feet shorter than ron . if ron stands at 12 feet how deep was the water ?
# solution in Python:
def solution():
"""after resting they decided to go for a swim . if the depth of the water is 10 times dean 's height and he stands at 6 feet how deep was the water ?"""
dean_height = 6
water_depth = 10 * dean_height
result = water_depth
return result
@allanj
allanj / gsm8k_convert_prompt.py
Created February 18, 2025 07:45
ReFT_GSM8K_convert_prompt
MATH_PROMPT = '''Question: Riku has 25 times more stickers than Kristoff. If Kristoff has 85 stickers, how many stickers does Riku have?
Answer:
If Kristoff has 85 stickers, Riku has 25 * 85 stickers = <<85*25=2125>>2125 more stickers.
The total number of stickers Riku has is 2125 stickers + 85 stickers = <<2125+85=2210>>2210 stickers.
# solution in Python:
def solution():
"""Riku has 25 times more stickers than Kristoff. If Kristoff has 85 stickers, how many stickers does Riku have?"""
ratio = 25
@allanj
allanj / client.py
Created December 23, 2024 04:05 — forked from youkaichao/client.py
client.py
import openai
import asyncio
async def get_choice_completion(prompt, choices):
# Initialize an asynchronous OpenAI client
async with openai.AsyncClient(base_url="http://127.0.0.1:8000/v1", api_key="abc") as client:
choice_probs = {}
# Calculate logprobs for each prompt + choice sequence
for choice in choices:
@allanj
allanj / example.jsonl
Created April 7, 2024 07:21
Example Reflexion Json (CodeLLaMA-7b-instruct)
{"name": "HumanEval_79_decimal_to_binary", "language": "py", "prompt": "def decimal_to_binary(decimal: int) -> str:\n \"\"\"You will be given a number in decimal form and your task is to convert it to\n binary format. The function should return a string, with each character representing a binary\n number. Each character in the string will be '0' or '1'.\n\n There will be an extra couple of characters 'db' at the beginning and at the end of the string.\n The extra characters are there to help with the format.\n\n Examples:\n >>> decimal_to_binary(15)\n 'db1111db'\n >>> decimal_to_binary(32)\n 'db100000db'\n \"\"\"\n", "doctests": "transform", "original": "/home/arjun/repos/nuprl/MultiPL-E/datasets/../datasets/originals-with-cleaned-doctests/HumanEval_79_decimal_to_binary.py", "prompt_terminology": "reworded", "stop_tokens": ["\ndef", "\n#", "\nif", "\nclass"], "entry_point": "decimal_to_binary", "test": "def check(candidate):\n assert candidate(0) == 'db0db'\n assert cand
@allanj
allanj / bootstrap.py
Created February 7, 2024 09:10
Bootstraping t test
"""
This is a simple example to show how to calculate the p_value of two models' accuracy
Bootstrapint t-test
"""
import random
random.seed(42)
# assume we have test set 1000 samples
# we just create dummy results to demo
groundtruth = [random.choice(['A', 'B', 'C']) for _ in range(1000)]
@allanj
allanj / demo_sft_with_accelerate.py
Last active January 11, 2024 13:05
demo_sft_script
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, PreTrainedTokenizerFast, set_seed, AutoModelForCausalLM, AutoConfig
from tqdm import tqdm
import argparse
import torch
import torch.nn as nn
import logging
from typing import Dict, Tuple
from accelerate import Accelerator, DistributedDataParallelKwargs
from accelerate.logging import get_logger
@allanj
allanj / command.md
Created August 15, 2022 02:36
Useful Command in Linux

Kill process contain certain string

For example, kill command contains python3 -u experiment_main.py

kill $(ps aux | grep '[p]ython3 -u experiment_main.py' | awk '{print $2}')

Hadoop List files by date

hdfs dfs -ls / | sort -k6,7
@allanj
allanj / fairseq_gen.py
Created July 8, 2021 04:22
Fairseq Generation
import torch
from fairseq.models.bart import BARTModel
bart = BARTModel.from_pretrained(
'model_files/bart-large-model',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='data/cloze_replace_all-bin'
)
bart.cuda()