Generated via GitHub Copilot + Claude 4.5 Sonnet on 1 Nov 2025 on https://github.com/defog-ai/defog/ with the prompt: "Summarize the defog approach to text-to-SQL clearly with examples and links to source."
Defog provides a comprehensive toolkit for converting natural language questions into SQL queries and executing them against databases. Here's how their approach works:
Defog uses a unified interface across multiple LLM providers (OpenAI, Anthropic, Gemini, Together AI, xAI) to generate SQL queries. The system can work with any supported provider.
Source: defog/llm/sql_generator.py
The system builds intelligent prompts by:
- Extracting database metadata (tables, columns, data types, descriptions)
- Formatting schema as DDL with inline comments for context
- Including business glossaries and domain-specific terminology
- Maintaining conversation history for follow-up questions
Key function: build_sql_generation_prompt
For databases with many tables (>5) or columns (>1000), Defog automatically identifies the most relevant tables before generating SQL:
# Automatic relevance filtering
relevance_result = await identify_relevant_tables_tool(
question="Show me customer orders from last month",
db_type="postgres",
db_creds=db_creds,
model="gpt-4o",
provider=LLMProvider.OPENAI,
max_tables=10,
)Source: defog/llm/sql.py
from defog.llm.sql import sql_answer_tool
from defog.llm.llm_providers import LLMProvider
# Ask questions in natural language
result = await sql_answer_tool(
question="What are the top 10 customers by total sales?",
db_type="postgres",
db_creds={
"host": "localhost",
"database": "mydb",
"user": "postgres",
"password": "password",
"port": 5432,
},
model="claude-sonnet-4-20250514",
provider=LLMProvider.ANTHROPIC,
)
print(f"SQL: {result['query']}")
print(f"Results: {result['results']}")Source: README.md
Maintains conversation history for follow-up questions:
# With conversation history
result = await sql_answer_tool(
question="Show me the trend for the same customers",
db_type="postgres",
db_creds=db_creds,
model="claude-sonnet-4-20250514",
provider=LLMProvider.ANTHROPIC,
previous_context=[
{
"question": "What are the top 10 customers?",
"sql_query": "SELECT customer_id, SUM(amount)...",
"results": "[(1, 50000), (2, 45000), ...]",
}
],
)Include domain-specific terminology and business rules:
result = await sql_answer_tool(
question="Show me CLV for active customers",
db_type="postgres",
db_creds=db_creds,
model="claude-sonnet-4-20250514",
provider=LLMProvider.ANTHROPIC,
glossary="""
Total Sales: Sum of all order amounts for a customer
Active Customer: Customer with purchase in last 90 days
CLV: Customer Lifetime Value - total spend since first purchase
""",
)Documentation: Database Operations Guide
- Metadata Extraction - Retrieve database schema information
- Table Filtering (if needed) - Identify relevant tables for large schemas
- Prompt Building - Construct context-aware prompts with schema, glossary, and conversation history
- SQL Generation - Use LLM to convert natural language to SQL
- Query Execution - Run generated SQL against the database
- Result Return - Provide results with column names and data
Implementation: generate_sql_query_local
Works with 11+ database types including PostgreSQL, MySQL, BigQuery, Snowflake, Databricks, SQL Server, Redshift, SQLite, and DuckDB.
Full list: Database Operations - Supported Databases
A complete example implementation is available at examples/sql_agent_example.py showing SQLite setup with sample e-commerce data and natural language querying.