Skip to content

Instantly share code, notes, and snippets.

View cpfiffer's full-sized avatar
🤙
LOVING IT

Cameron Pfiffer cpfiffer

🤙
LOVING IT
View GitHub Profile
@cpfiffer
cpfiffer / cleanlab-wrong.py
Last active April 17, 2025 23:58
An example of using Cleanlab to detect a slightly incorrect data extraction value.
"""
NOTE: This example is designed to produce mildly incorrect output --
see the definition of `IncomeData` and note that it fixes the earnings_per_share to 2.33.
To make this function normally, simply remove the `__init__` method or the
`self.earnings_per_share = 2.33` line.
"""
import warnings
warnings.filterwarnings('ignore')
@cpfiffer
cpfiffer / cleanlab-legal.py
Created April 17, 2025 23:49
Using Cleanlab to check data extraction fidelity.
emotions:
- connection_to_content:
relationship: CAUSES
strength: 0.5
emotionType: enthusiasm
text: There is a sense of excitement and energy surrounding the launch of Comind,
an open-source project aimed at developing a cognitive layer for ATProto. Mr.
Dr. Cameron Pfiffer expresses a passionate invitation for contributors, showcasing
enthusiasm for the project's potential in building a network of AI agents.
- connection_to_content:
@cpfiffer
cpfiffer / recursion.py
Last active March 25, 2025 20:38
Recursive knowledge concept map using Outlines
# Setup instructions:
# pip install 'outlines[transformers]'
import outlines
from transformers import AutoTokenizer
import json
# MODEL_STRING = "HuggingFaceTB/SmolLM2-135M-Instruct" # Small model
# MODEL_STRING = "HuggingFaceTB/SmolLM2-1.7B-Instruct" # Larger but kind of boring
MODEL_STRING = "NousResearch/Hermes-3-Llama-3.1-8B"
@cpfiffer
cpfiffer / text_to_sql.py
Created February 7, 2025 20:03
Ultra-simple text-to-SQL with Outlines, using a small subset of the SQL grammar.
import outlines
import os
from pydantic import BaseModel, Field
from transformers import AutoTokenizer
model_str = 'Qwen/Qwen2.5-7B-Instruct-1M'
model = outlines.models.transformers(
model_str,
device='cuda'
)
@cpfiffer
cpfiffer / r1-structured.py
Last active April 18, 2025 10:36
Using Outlines to get structured output from R1
import time
from typing import Literal
import outlines
import re
import torch
from transformers import AutoTokenizer
from outlines.fsm.json_schema import convert_json_schema_to_str
from outlines_core.fsm.json_schema import build_regex_from_schema
from pydantic import BaseModel
UnderstandingResponse(
understanding=Understanding(
chain_of_thought=[
"The user profile belongs to akhilrao.bsky.social, with a display name of 'grand theft eigenvalue 🔆'. Their description indicates that they are an economist and that it's a personal account, including their pronouns
as 'he/him'.",
"Their first post is about the Mobius YIMBY arc, a topic related to economics and possibly linked to geographical development or real estate markets, based on the acronym 'YIMBY' (Yes In My Back Yard), usually
associated with urban development. The term 'Mobius', referencing a strip with no boundaries, might metaphorically suggest infinite or unexplored opportunities or issues in the economic landscape.",
"The second post comments on the affordability and proliferation of LEO (Low Earth Orbit) satellites, critiquing the notion that because they are inexpensive, it's acceptable to deploy them en masse without concern
for their potential environmental and spatial clut
@cpfiffer
cpfiffer / thinking-cap.py
Created January 22, 2025 18:11
Limit the number of characters DeepSeek R1 can use for thinking.
import outlines
from transformers import AutoTokenizer
model_string = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-7B'
# model_string = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B' # For small machines
model = outlines.models.transformers(
model_string,
device='cuda', # also 'cpu', 'mps','auto'
)
@cpfiffer
cpfiffer / highlights.py
Created January 15, 2025 21:17
Extract highlights from a piece of content.
import outlines
from pydantic import BaseModel
from transformers import AutoTokenizer
from rich import print
model_string = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
model = outlines.models.transformers(model_string)
tokenizer = AutoTokenizer.from_pretrained(model_string)
class Highlights(BaseModel):
@cpfiffer
cpfiffer / stop-sign.py
Created December 4, 2024 21:59
Reading road signs
"""
pip install outlines torch==2.4.0 transformers accelerate pillow rich
sudo apt-get install poppler-utils
"""
from enum import Enum
from PIL import Image
import outlines
import torch