Skip to content

Instantly share code, notes, and snippets.

@Manojbhat09
Last active August 12, 2025 06:22
Show Gist options
  • Save Manojbhat09/00290be069fbec75d3b8d456f48013f6 to your computer and use it in GitHub Desktop.
Save Manojbhat09/00290be069fbec75d3b8d456f48013f6 to your computer and use it in GitHub Desktop.
DSPy Grug Translator: end-to-end example from LearnByBuilding.ai tutorial

DSPy Grug Translator

End-to-end example from LearnByBuilding.ai tutorial implementing a "Grug speak" translator using the DSPy framework.

Overview

This project demonstrates how to build a translation system that converts plain English text into "Grug speak" (caveman-style language) using DSPy for prompt optimization and evaluation.

Files

  • dspy_grug_translator.py - Main script with CLI interface
  • utils_dspy_grug.py - Utility functions and classes
  • README.md - This file

Setup

Dependencies

Install required packages:

pip install dspy-ai requests beautifulsoup4 openai python-dotenv numpy

Environment Variables

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Usage

Basic Usage

python dspy_grug_translator.py

CLI Parameters

python dspy_grug_translator.py \
    --dataset-size 10 \
    --train-model gpt-3.5-turbo \
    --eval-model gpt-4-turbo \
    --output-path /tmp/grug_model.json \
    --max-demos 4

Available Parameters

  • --dataset-size: Number of examples to scrape and use (default: 10)
  • --train-model: OpenAI model for training (default: gpt-3.5-turbo)
  • --eval-model: OpenAI model for evaluation (default: gpt-4-turbo)
  • --output-path: Path to save the optimized model (default: /tmp/grug_translator_model.json)
  • --max-demos: Maximum number of demonstrations for optimization (default: 4)

How It Works

  1. Data Collection: Scrapes "Grug speak" examples from grugbrain.dev
  2. Dataset Creation: Uses GPT to translate Grug text to plain English
  3. Model Training: Creates DSPy examples and splits into train/test sets
  4. Optimization: Uses BootstrapFewShot to optimize the translation model
  5. Evaluation: Tests performance using similarity and readability metrics
  6. Model Saving: Saves the optimized model for future use

Metrics

  • Similarity Metric: Uses GPT-4 to assess semantic similarity between translations
  • ARI Metric: Automated Readability Index to ensure appropriate reading level
  • Combined Metric: Both metrics must pass for overall success

Safety Notes

⚠️ Important Considerations:

  • This tool uses OpenAI APIs which incur costs
  • The scraping functionality accesses external websites
  • Generated content should be reviewed before use
  • Model outputs may vary between runs
  • Keep your API keys secure and never commit them to version control

Architecture

The codebase is organized into:

  • Main Script: CLI interface and orchestration
  • Utils Module: Reusable components (metrics, message building, translation)
  • DSPy Integration: Signature definitions and module implementations

Example Output

Input: "You should not construct complex systems." Output: "grug no make big complicated thing. big thing bad. make grug brain hurt."

Troubleshooting

  • Ensure your OpenAI API key is valid and has sufficient credits
  • Check internet connectivity for web scraping
  • Verify all dependencies are installed correctly
  • Review error messages for specific issues

License

This is an educational example from LearnByBuilding.ai tutorial.

#!/usr/bin/env python3
"""
DSPy Grug Translator: end-to-end example from LearnByBuilding.ai tutorial
Complete implementation of translating text to "Grug speak" using DSPy framework
Updated with CLI interface and modular utilities
"""
import argparse
import logging
import os
import random
import time
from functools import wraps
from random import shuffle
from typing import Optional
import dspy
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
# Import utilities from our utils module
from utils_dspy_grug import (
translate_grug,
overall_metric,
similarity_metric,
ari_metric
)
# Load environment variables
load_dotenv()
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
# Retry decorator
def retry(max_attempts=3, delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_attempts - 1:
raise
logger.warning(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay} seconds...")
time.sleep(delay)
return None
return wrapper
return decorator
# Function to split data for train/test
def split_data(data, train_ratio=0.8, seed=42):
"""
Split data into training and test sets
"""
random.seed(seed)
shuffled_data = data.copy()
shuffle(shuffled_data)
split_index = int(len(shuffled_data) * train_ratio)
train_data = shuffled_data[:split_index]
test_data = shuffled_data[split_index:]
return train_data, test_data
@retry(max_attempts=3)
def fetch_reddit_posts(num_posts=50):
"""
Fetch Reddit posts from r/explainlikeimfive
"""
logger.info(f"Fetching {num_posts} Reddit posts...")
url = "https://www.reddit.com/r/explainlikeimfive/hot/.json"
headers = {'User-Agent': 'DSPy Grug Translator 1.0'}
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
posts = []
for post_data in data['data']['children'][:num_posts]:
post = post_data['data']
title = post['title']
if title and len(title) > 20 and not title.startswith('['):
# Split by sentence-ending punctuation or newlines
sentences = [s.strip() for s in
re.split(r'[.!?\n]', title) if s.strip()]
for sentence in sentences:
if len(sentence) > 10:
posts.append(sentence)
if len(posts) >= num_posts:
break
if len(posts) >= num_posts:
break
logger.info(f"Successfully fetched {len(posts)} posts")
return posts
def configure_dspy(provider, model_name=None, eval_provider=None, eval_model_name=None):
"""
Configure DSPy with the specified provider and model
"""
logger.info(f"Configuring DSPy with provider: {provider}")
if provider == "openai":
lm = dspy.OpenAI(model="gpt-3.5-turbo")
elif provider == "ollama":
if not model_name:
model_name = "llama3.2"
lm = dspy.Ollama(model=model_name)
else:
raise ValueError(f"Unsupported provider: {provider}")
# Configure evaluation provider if specified
eval_lm = None
if eval_provider:
logger.info(f"Configuring evaluation provider: {eval_provider}")
if eval_provider == "openai":
eval_lm = dspy.OpenAI(model="gpt-3.5-turbo")
elif eval_provider == "ollama":
if not eval_model_name:
eval_model_name = "llama3.2"
eval_lm = dspy.Ollama(model=eval_model_name)
else:
logger.warning(f"Unsupported eval provider: {eval_provider}, using main provider")
dspy.configure(lm=lm)
return lm, eval_lm
@retry(max_attempts=3)
def run_optimization(train_data, test_data, num_threads=1):
"""
Run DSPy optimization on the training data
"""
logger.info("Starting optimization process...")
# Create optimizer
teleprompter = dspy.BootstrapFewShot(
metric=overall_metric,
max_bootstrapped_demos=4,
max_labeled_demos=16,
num_threads=num_threads
)
# Optimize
optimized_program = teleprompter.compile(
translate_grug,
trainset=train_data
)
logger.info("Optimization completed")
return optimized_program
@retry(max_attempts=3)
def evaluate_program(program, test_data, eval_lm=None):
"""
Evaluate the program on test data
"""
logger.info("Starting evaluation...")
# If we have a separate evaluation LM, configure it temporarily
original_lm = None
if eval_lm:
original_lm = dspy.settings.lm
dspy.configure(lm=eval_lm)
try:
evaluator = dspy.Evaluate(
devset=test_data,
metric=overall_metric,
num_threads=1,
display_progress=True
)
result = evaluator(program)
logger.info(f"Evaluation completed with score: {result}")
return result
finally:
# Restore original LM if we changed it
if original_lm:
dspy.configure(lm=original_lm)
def main():
parser = argparse.ArgumentParser(description="DSPy Grug Translator with CLI interface")
# Provider configuration
parser.add_argument("--provider", choices=["openai", "ollama"], default="openai",
help="LLM provider to use (default: openai)")
parser.add_argument("--ollama-model", default="llama3.2",
help="Ollama model name (default: llama3.2)")
# Evaluation provider configuration
parser.add_argument("--eval-provider", choices=["openai", "ollama"],
help="Separate LLM provider for evaluation (optional)")
parser.add_argument("--eval-ollama-model", default="llama3.2",
help="Ollama model for evaluation (default: llama3.2)")
# Data and processing options
parser.add_argument("--num-posts", type=int, default=100,
help="Number of Reddit posts to fetch (default: 100)")
parser.add_argument("--train-ratio", type=float, default=0.8,
help="Ratio of data to use for training (default: 0.8)")
parser.add_argument("--num-threads", type=int, default=1,
help="Number of threads for optimization (default: 1)")
# Random seed
parser.add_argument("--seed", type=int, default=42,
help="Random seed for reproducibility (default: 42)")
# Skip optimization flag
parser.add_argument("--skip-optimization", action="store_true",
help="Skip the optimization step")
# Interactive mode
parser.add_argument("--interactive", action="store_true",
help="Run in interactive mode for manual testing")
args = parser.parse_args()
# Set random seed
random.seed(args.seed)
try:
# Configure DSPy
lm, eval_lm = configure_dspy(
provider=args.provider,
model_name=args.ollama_model if args.provider == "ollama" else None,
eval_provider=args.eval_provider,
eval_model_name=args.eval_ollama_model if args.eval_provider == "ollama" else None
)
if args.interactive:
logger.info("Starting interactive mode...")
print("\nInteractive Grug Translator Mode")
print("Type 'quit' to exit\n")
while True:
text = input("Enter text to translate to Grug speak: ").strip()
if text.lower() == 'quit':
break
if text:
try:
result = translate_grug(text=text)
print(f"Grug speak: {result.translation}\n")
except Exception as e:
logger.error(f"Translation failed: {e}")
return
# Fetch data
reddit_posts = fetch_reddit_posts(args.num_posts)
if len(reddit_posts) < 10:
logger.error("Not enough data fetched. Exiting.")
return
# Create training examples
examples = [dspy.Example(text=post).with_inputs('text') for post in reddit_posts]
# Split data
train_data, test_data = split_data(examples, args.train_ratio, args.seed)
logger.info(f"Split data: {len(train_data)} training, {len(test_data)} test examples")
if not args.skip_optimization:
# Run optimization
optimized_program = run_optimization(train_data, test_data, args.num_threads)
else:
logger.info("Skipping optimization, using base program")
optimized_program = translate_grug
# Evaluate
if test_data:
evaluation_score = evaluate_program(optimized_program, test_data, eval_lm)
logger.info(f"Final evaluation score: {evaluation_score}")
# Test with a sample
sample_text = "Why do we need to sleep every night?"
logger.info(f"Testing with sample: '{sample_text}'")
result = optimized_program(text=sample_text)
print(f"\nOriginal: {sample_text}")
print(f"Grug speak: {result.translation}")
logger.info("Process completed successfully!")
except KeyboardInterrupt:
logger.info("Process interrupted by user")
except Exception as e:
logger.error(f"An error occurred: {e}")
raise
if __name__ == "__main__":
main()
# DSPy Grug Translator Dependencies
# Core DSPy framework for prompt engineering and optimization
dspy-ai>=2.4.0
# OpenAI API client for GPT models
openai>=1.0.0
# Web scraping and HTTP requests
requests>=2.28.0
beautifulsoup4>=4.11.0
# Environment variable management
python-dotenv>=0.19.0
# Scientific computing and data manipulation
numpy>=1.24.0
# Optional: For enhanced functionality
# scipy>=1.9.0 # Scientific computing (uncomment if needed)
# pandas>=1.5.0 # Data analysis (uncomment if needed)
# matplotlib>=3.6.0 # Plotting (uncomment if needed)
# Development dependencies (optional)
# pytest>=7.0.0 # Testing framework
# black>=22.0.0 # Code formatting
# flake8>=5.0.0 # Linting
#!/usr/bin/env python3
"""
Utility functions and classes for DSPy Grug Translator
Factored out components: metrics, BuildMessages, and translate_grug
"""
import dspy
import re
from openai import OpenAI
from functools import cache
# Initialize OpenAI client for utilities
client = OpenAI()
# Helper class for building messages
class BuildMessages:
def __init__(self, system_prompt, user_prompt):
self.system_prompt = system_prompt
self.user_prompt = user_prompt
def render(self, **kwargs):
sys = self.system_prompt.format(**kwargs)
user = self.user_prompt.format(**kwargs)
return [
{"role": "system", "content": sys},
{"role": "user", "content": user},
]
# DSPy Signature for Grug Translation
class GrugTranslate(dspy.Signature):
"""Translate regular text into Grug speak (caveman language)"""
text = dspy.InputField(desc="Text to translate to Grug speak")
translation = dspy.OutputField(desc="Text translated to Grug speak")
# Main translate_grug DSPy module
translate_grug = dspy.ChainOfThought(GrugTranslate)
# Function to translate Grug text to plain English (for dataset creation)
@cache
def translate_grug_to_english(grug_text, model_name="gpt-3.5-turbo"):
prompt = BuildMessages(
"You are an expert in deciphering strange text. The user will provide text written by someone named Grug and you will provide the translation.",
"""Translate the following text into plain english: '{text}'.
Do not respond with any other text. Only provide that text. Now take a deep breath and begin."""
)
result = client.chat.completions.create(
messages=prompt.render(text=grug_text),
model=model_name
)
return result.choices[0].message.content
# Automated Readability Index calculation
def automated_readability_index(text):
characters = len(re.sub(r'\s+', '', text)) # Count characters (ignoring whitespace)
words = len(text.split()) # Count words by splitting the text
# Count sentences by finding period, exclamation, question mark, or newline
sentences = len(re.findall(r'[.!?\n]', text))
# small change is to add a new line character as grug doesn't seem to use punctuation.
if words == 0 or sentences == 0: # Prevent division by zero
return 0
# Calculate the Automated Readability Index (ARI)
ari = (4.71 * (characters / words)) + (0.5 * (words / sentences)) - 21.43
return round(ari, 2)
# AI Assessment Signature
class AssessBasedOnQuestion(dspy.Signature):
"""Given the assessed text provide a yes or no to the assessment question."""
assessed_text = dspy.InputField(format=str)
assessment_question = dspy.InputField(format=str)
assessment_answer = dspy.OutputField(desc="Yes or No")
# Similarity metric using AI feedback
def similarity_metric(truth, pred, trace=None, eval_model="gpt-4-turbo"):
gpt4T = dspy.OpenAI(model=eval_model, max_tokens=500)
truth_grug_text = truth.grug_text
proposed_grug_text = pred.grug_text
similarity_question = f"""Does the assessed text have the same meaning as the gold_standard text provided?
Gold Standard: "{truth_grug_text}"
Provide only a yes or no answer."""
with dspy.context(lm=gpt4T):
assessor = dspy.Predict(AssessBasedOnQuestion)
raw_similarity_result = assessor(
assessed_text=proposed_grug_text,
assessment_question=similarity_question
)
print(raw_similarity_result) # for debugging
raw_similarity = raw_similarity_result.assessment_answer.lower().strip()
same_meaning = raw_similarity == 'yes'
return same_meaning
# ARI metric
def ari_metric(truth, pred, trace=None):
truth_grug_text = truth.grug_text
proposed_grug_text = pred.grug_text
gold_ari = automated_readability_index(truth_grug_text)
pred_ari = automated_readability_index(proposed_grug_text)
print(f"ARI {gold_ari} => {pred_ari}")
ari_result = pred_ari <= 7.01
return ari_result
# Overall combined metric
def overall_metric(provided_example, predicted, trace=None, eval_model="gpt-4-turbo"):
similarity = similarity_metric(provided_example, predicted, trace, eval_model)
ari = ari_metric(provided_example, predicted, trace)
if similarity and ari:
return True
return False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment