Skip to content

Instantly share code, notes, and snippets.

View cpfiffer's full-sized avatar
🤙
LOVING IT

Cameron Pfiffer cpfiffer

🤙
LOVING IT
View GitHub Profile
@cpfiffer
cpfiffer / text_to_sql.py
Created February 7, 2025 20:03
Ultra-simple text-to-SQL with Outlines, using a small subset of the SQL grammar.
import outlines
import os
from pydantic import BaseModel, Field
from transformers import AutoTokenizer
model_str = 'Qwen/Qwen2.5-7B-Instruct-1M'
model = outlines.models.transformers(
model_str,
device='cuda'
)
@cpfiffer
cpfiffer / r1-structured.py
Last active February 8, 2025 02:54
Using Outlines to get structured output from R1
import time
from typing import Literal
import outlines
import re
import torch
from transformers import AutoTokenizer
from outlines.fsm.json_schema import convert_json_schema_to_str
from outlines_core.fsm.json_schema import build_regex_from_schema
from pydantic import BaseModel
UnderstandingResponse(
understanding=Understanding(
chain_of_thought=[
"The user profile belongs to akhilrao.bsky.social, with a display name of 'grand theft eigenvalue 🔆'. Their description indicates that they are an economist and that it's a personal account, including their pronouns
as 'he/him'.",
"Their first post is about the Mobius YIMBY arc, a topic related to economics and possibly linked to geographical development or real estate markets, based on the acronym 'YIMBY' (Yes In My Back Yard), usually
associated with urban development. The term 'Mobius', referencing a strip with no boundaries, might metaphorically suggest infinite or unexplored opportunities or issues in the economic landscape.",
"The second post comments on the affordability and proliferation of LEO (Low Earth Orbit) satellites, critiquing the notion that because they are inexpensive, it's acceptable to deploy them en masse without concern
for their potential environmental and spatial clut
@cpfiffer
cpfiffer / thinking-cap.py
Created January 22, 2025 18:11
Limit the number of characters DeepSeek R1 can use for thinking.
import outlines
from transformers import AutoTokenizer
model_string = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-7B'
# model_string = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B' # For small machines
model = outlines.models.transformers(
model_string,
device='cuda', # also 'cpu', 'mps','auto'
)
@cpfiffer
cpfiffer / highlights.py
Created January 15, 2025 21:17
Extract highlights from a piece of content.
import outlines
from pydantic import BaseModel
from transformers import AutoTokenizer
from rich import print
model_string = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
model = outlines.models.transformers(model_string)
tokenizer = AutoTokenizer.from_pretrained(model_string)
class Highlights(BaseModel):
@cpfiffer
cpfiffer / stop-sign.py
Created December 4, 2024 21:59
Reading road signs
"""
pip install outlines torch==2.4.0 transformers accelerate pillow rich
sudo apt-get install poppler-utils
"""
from enum import Enum
from PIL import Image
import outlines
import torch
@cpfiffer
cpfiffer / text-to-sql.md
Created November 15, 2024 17:41
text to sql with outlines

Text to SQL

!!! note This example was adapted from Morgan Giraud on our Discord. You can find their twitter here. Thank you Morgan!

Outlines provides experimental support for context-free grammars (CFGs) for text generation. Future versions will provide more comprehensive support for structured outputs.

SQL is a context-free language, meaning that the structure of the query is independent of the content.

@cpfiffer
cpfiffer / pdf-to-structure.py
Last active January 22, 2025 18:14
Get structured output from PDFs. Goes through a PDF one page at a time -- it is not currently build for multiple pages, but could be extended as needed.
"""
pip install outlines torch==2.4.0 transformers accelerate typing-extensions pillow pdf2image rich requests
may need to install tkinter: https://stackoverflow.com/questions/25905540/importerror-no-module-named-tkinter
sudo apt-get install poppler-utils
"""
from enum import Enum
from io import BytesIO
@cpfiffer
cpfiffer / text-to-sql.py
Created October 17, 2024 18:06
Outlines code to generate text-to-sql from a context-free grammar
import outlines
sql_grammar = """
start: set_expr -> final
set_expr: query_expr
| set_expr "UNION"i ["DISTINCT"i] set_expr -> union_distinct
| set_expr "UNION"i "ALL"i set_expr -> union_all
| set_expr "INTERSECT"i ["DISTINCT"i] set_expr -> intersect_distinct
| set_expr "EXCEPT"i ["DISTINCT"i] set_expr -> except_distinct
@cpfiffer
cpfiffer / coinflip.py
Created September 10, 2024 21:11
Coin flipping with outlines
import outlines
from transformers import BitsAndBytesConfig
# Load the model
model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
model_kwargs={
'quantization_config':BitsAndBytesConfig(
# Load the model in 4-bit mode
load_in_4bit=True,