Skip to content

Instantly share code, notes, and snippets.

@cpfiffer
Created November 15, 2024 17:41
Show Gist options
  • Save cpfiffer/0791eb96093e182511bf9c5ab030fa6e to your computer and use it in GitHub Desktop.
Save cpfiffer/0791eb96093e182511bf9c5ab030fa6e to your computer and use it in GitHub Desktop.
text to sql with outlines

Text to SQL

!!! note This example was adapted from Morgan Giraud on our Discord. You can find their twitter here. Thank you Morgan!

Outlines provides experimental support for context-free grammars (CFGs) for text generation. Future versions will provide more comprehensive support for structured outputs.

SQL is a context-free language, meaning that the structure of the query is independent of the content.

Practically speaking, this means that Outlines can constrain language models to generate only syntactically valid SQL queries.

First, we import Outlines and define our grammar:

import outlines

sql_grammar = r"""
    ?start: sum
          | NAME "=" sum    -> assign_var

    ?sum: product
        | sum "+" product   -> add
        | sum "-" product   -> sub

    ?product: atom
        | product "*" atom  -> mul
        | product "/" atom  -> div

    ?atom: NUMBER           -> number
         | "-" atom         -> neg
         | NAME             -> var
         | "(" sum ")"

    %import common.CNAME -> NAME
    %import common.NUMBER
    %import common.WS_INLINE

    %ignore WS_INLINE
"""

Next, we create a model and a generator function.

model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cpu"
)

sql_generator = outlines.generate.cfg(model, sql_grammar)

sql_generator is a function that accepts a prompt string, and returns an SQL query.

Let's try it out:

sequence = sql_generator(
    "Write valid code to print 2+2."
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment