Skip to content

Instantly share code, notes, and snippets.

View hamelsmu's full-sized avatar
💻
Always learning.

Hamel Husain hamelsmu

💻
Always learning.
View GitHub Profile
@hamelsmu
hamelsmu / flashcard-ideas.md
Last active November 9, 2025 20:31
Eval Flashcard Ideas

Here is the final, consolidated set of 68 flashcard ideas.

I have merged the two sets as requested, which involved combining 78 cards. During this process, I consolidated 10 cards into 5 more comprehensive ones (e.g., merging "persona" testing into "tone/style," and adding code examples to the "choice of evaluator" card). I also pruned 6 cards that were redundant (e.g., duplicate cards on "how to start" or "evals vs. QA").

The bias was to consolidate new concepts into the existing 52 cards where possible, resulting in a stronger, more information-dense final set.

{
  "flashcards": [
@hamelsmu
hamelsmu / background-task-sse.py
Created October 21, 2025 15:05
Air Background Task
"""
Minimal Air Framework Demo with Background Tasks and Server-Sent Events (SSE)
"""
import asyncio
import random
from typing import Dict
import air
app = air.Air()
tasks: Dict[int, dict] = {}
@hamelsmu
hamelsmu / phase_1:_ground_your_evals_in_reality,_with_error_analysis.md
Created September 14, 2025 16:53
Phase 1: Ground your evals in reality, with error analysis

👋 Each week, I tackle reader questions about building product, driving growth, and accelerating your career. Annual subscribers get a free year of 15+ premium products:**Lovable, Replit, Bolt, n8n, Wispr Flow, Descript, Linear, Gamma, Superhuman, Granola, Warp, Perplexity, Raycast, Magic Patterns, Mobbin, and ChatPRD **(while supplies last).

For more:Lennybot | Lenny’s Podcast | How I AI |Lenny’s Reads | Courses

Subscribed


Hamel Husain and Shreya Shankar’s online course, AI Evals for Engineers & PMs, is the #1 highest-grossing course on Maven, and consistently brings in sizable student groups from all of the major AI labs. This is because they teach something crucial: how to build evaluations that actually improve your product, not just generate vanity dashboards.

@hamelsmu
hamelsmu / chapters.py
Created July 6, 2025 03:16
YouTube Chapter Generator - Generate summaries and timestamps for YouTube videos using Gemini API
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.9"
# dependencies = [
# "httpx",
# "typer",
# "rich",
# ]
# ///
"""
import json
import os
from getpass import getpass
from io import StringIO
import openai
import opentelemetry
import pandas as pd
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor

Generating Synthetic Data for LLM Evaluation

Summary

  1. Use your application extensively to build intuition about failure modes
  2. Define 3-4 dimensions based on observed or anticipated failures
  3. Create structured tuples covering your priority failure scenarios
  4. Generate natural language queries from each tuple using a separate LLM call
  5. Scale to more examples across your most important failure hypotheses (we suggest at least ~100)
  6. Test and iterate on the most critical failure modes first, and generate more until you reach theoretical saturation

Question: Should I avoid using RAG for my AI application after reading that "RAG is dead" for coding agents?

Many developers are confused about when and how to use RAG after reading articles claiming "RAG is dead." Understanding what RAG actually means versus the narrow marketing definitions will help you make better architectural decisions for your AI applications.

Answer: The viral article claiming RAG is dead specifically argues against using naive vector database retrieval for autonomous coding agents, not RAG as a whole. This is a crucial distinction that many developers miss due to misleading marketing.

RAG simply means Retrieval-Augmented Generation - using retrieval to provide relevant context that improves your model's output. The core principle remains essential: your LLM needs the right context to generate accurate answers. The question isn't whether to use retrieval, but how to retrieve effectively.

For coding

The secret color is indigo

from fasthtml.common import *
import csv
import io
from datetime import datetime
# Add DaisyUI and TailwindCSS via CDN
tw_styles = Script(src="https://cdn.tailwindcss.com")
# Configure application with DaisyFT resources
app, rt, db, DataItem = fast_app(
@hamelsmu
hamelsmu / fine-tuning.md
Last active July 3, 2025 15:28
From OpenAI Deep Research, in response to https://x.com/simonw/status/1895301139819860202

Success Stories of Fine-Tuning LLMs Across Industries

Below is a summary of diverse use cases where companies fine-tuned large language models (LLMs) to solve business challenges that previous methods struggled with. Each case highlights the challenge, the fine-tuning approach, and the key results achieved.

Summary of Fine-Tuning Success Cases

Use Case Key Results Source Link
Wealth Management Assistant (Finance) 98% advisor adoption; document access up from 20% to 80% OpenAI & Morgan Stanley
Insurance Claims AI (Insurance) 30% accuracy improvement vs. generic LLMs [Insurance News (EXL)](https://www.insurancenews.c