Skip to content

Instantly share code, notes, and snippets.

@inchoate
Last active March 30, 2025 04:02
Show Gist options
  • Save inchoate/9d5434aae1ff8811045c78711ce775a6 to your computer and use it in GitHub Desktop.
Save inchoate/9d5434aae1ff8811045c78711ce775a6 to your computer and use it in GitHub Desktop.
Recipe for converting Pydantic models to SQLModel while still supporting the ability to fully emit nested schema and model data.

Emitting Fully Nested Python Schemas and Models with SQLModel

I ran into a problem while developing some cool tools. I had a nested Pydantic structure that I needed to migrate to SQLModel. But, I needed to retain the ability to emit fully nested schema and models -- which was lost. So, I had to write a few helper functions and models.

In the realm of modern web development and data-driven applications, managing complex data structures and their relationships can be a daunting task. SQLModel, a powerful library that bridges the gap between SQLAlchemy and Pydantic, offers an elegant solution for defining and interacting with SQL databases using Python. In this small write up, we dive into a practical demonstration of leveraging SQLModel to emit fully nested Python schemas and models. Surprisingly, this doesn't work out of the box.

Our focus is on a versatile script that illustrates how to dynamically generate and manipulate nested data structures. By defining SQLModel-based types, we can seamlessly map these models to corresponding database tables while retaining the full richness of Python's type system. This script demonstrates how to:

  1. Create Instances: Dynamically instantiate models from dictionaries, including handling nested relationships and collections.
  2. Dump Instances: Serialize complex model instances into nested dictionaries, preserving all relationships and preventing infinite recursion.
  3. Generate JSON Schemas: Automatically produce JSON schemas for SQLModel classes, capturing both the data structure and constraints.

Whether you're building RESTful APIs, data validation layers, or simply seeking a robust way to manage relational data in Python, this will guide you through creating and leveraging fully nested schemas and models using SQLModel. By the end of this tutorial, you should have a better understanding of how to use SQLModel to handle complex data relationships.

Sadly, this is longer than I expected it to be, but as short as I could make it while maintaining correctness and examples.

Key Concepts and Improvements in SQLModel and Pydantic Integration

When working with SQLModel, it’s crucial to ensure that nested structures are preserved in both schema generation and model instances. This involves several key practices and improvements over the initial migration from Pydantic models:

  1. Consistent Use of Primary Keys: SQLModel requires explicit primary key definitions to manage data consistency and operations. Unlike Pydantic, where primary keys might not be needed, SQLModel ensures robust database interactions.

  2. Implementation of Read Models: For each primary model, a corresponding read model facilitates data serialization and deserialization. Read models use Pydantic's validation and ORM mode to ensure compatibility and flexible data handling.

Let's discuss something important. Compare the following to class definitions:

class TeamBase(SQLModel):
    ...

as compared to this

class Team(TeamBase, table=True):
    ...

The first is effectively a Pydantic model, something used for data modeling and schema validation. So, your data fields go here. You don't need any database IDs, primary keys, or relationships. Just the data.

The second, with table=True is going to be saved to disk. It's a class that now knows about SQLAlchemy. So, you add your DB IDs, primary keys, relationships, etc.

  1. Model Configuration: Setting model_config with from_attributes=True ensures proper attribute mapping and validation. This enables you to fully hydrate the model when it's read from disk. (This used to be called orm_mode.)

  2. Field Descriptions for GenAI Integration: If utilizing generative AI or similar technologies, providing field descriptions in read models enhances model clarity and usability.

Code Implementation

Main Script

Below is the code implementation that demonstrates the process of creating, dumping, and generating JSON schemas for SQLModel-based types, with proper handling of nested structures:

from typing import Any, Dict, List, Optional, Set, Type

from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.inspection import inspect
from sqlalchemy.sql.type_api import TypeEngine
from sqlmodel import JSON, Boolean, Field, Float, Integer, Relationship, SQLModel, String


def create_instance(model: Type[SQLModel], data: Dict[str, Any]) -> SQLModel:
    """
    Create an instance of a SQLModel using the provided data dictionary.

    This function dynamically maps nested dictionaries to corresponding ORM models.

    Args:
        model (Type[SQLModel]): The SQLModel class to instantiate.
        data (Dict[str, Any]): A dictionary containing the data to populate the model.

    Returns:
        SQLModel: An instance of the model populated with the provided data.

    Example:
        hero = create_instance(Hero, {"name": "Jason", "secret_name": "Tree"})
    """
    mapper = inspect(model)
    columns = {column.key: column for column in mapper.columns}
    relationships = {rel.key: rel for rel in mapper.relationships}

    instance_data = {}
    for key, value in (data or {}).items():
        if key in columns:
            instance_data[key] = value
        elif key in relationships:
            related_model = relationships[key].mapper.class_
            if isinstance(value, list):
                instance_data[key] = [
                    create_instance(related_model, v) for v in value if v is not None
                ]
            elif value is not None:
                instance_data[key] = create_instance(related_model, value)
            else:
                instance_data[key] = [] if relationships[key].uselist else None

    return model(**instance_data)


def dump_instance(instance: SQLModel, processed_ids: Optional[Set[int]] = None) -> Dict[str, Any]:
    """
    Recursively serialize a SQLModel instance into a dictionary, including nested relationships.

    Args:
        instance (SQLModel): The SQLModel instance to serialize.
        processed_ids (Optional[Set[int]]): Set of processed instance IDs to avoid infinite recursion.

    Returns:
        Dict[str, Any]: A dictionary representation of the instance.

    Example:
        data_dict = dump_instance(hero_instance)
    """
    if processed_ids is None:
        processed_ids = set()

    instance_id = id(instance)
    if instance_id in processed_ids:
        return {}  # Prevent infinite recursion
    processed_ids.add(instance_id)

    mapper = inspect(instance.__class__)
    columns = {column.key: column for column in mapper.columns}
    relationships = {rel.key: rel for rel in mapper.relationships}

    data = {}
    for key in columns:
        data[key] = getattr(instance, key)

    for key, relationship in relationships.items():
        related_data = getattr(instance, key)
        if relationship.uselist:
            data[key] = [
                dump_instance(rel_instance, processed_ids) for rel_instance in related_data
            ]
        else:
            data[key] = dump_instance(related_data, processed_ids) if related_data else None

    return data


def generate_json_schema(
    model: Type[SQLModel], processed_models: Optional[Set[str]] = None
) -> Dict[str, Any]:
    """
    Generate a JSON schema for a SQLModel class, including nested relationships.

    Args:
        model (Type[SQLModel]): The SQLModel class to generate the schema for.
        processed_models (Optional[Set[str]]): Set of already processed models to prevent recursion.

    Returns:
        Dict[str, Any]: The JSON schema as a dictionary.

    Example:
        schema = generate_json_schema(Hero)
    """
    if processed_models is None:
        processed_models = set()

    mapper = inspect(model)
    columns = {column.key: column for column in mapper.columns}
    relationships = {rel.key: rel for rel in mapper.relationships}

    schema = {"title": model.__name__, "type": "object", "properties": {}, "required": []}

    processed_models.add(model.__name__)

    for key, column in columns.items():
        if key == "id":
            continue

        field_info = model.__fields__.get(key)
        json_type = _get_json_type(column.type)
        property_schema = {"type": json_type}

        if field_info and field_info.description:
            property_schema["description"] = field_info.description

        schema["properties"][key] = property_schema
        if not column.nullable:
            schema["required"].append(key)

    for key, relationship in relationships.items():
        related_model = relationship.mapper.class_
        if related_model.__name__ not in processed_models:
            if relationship.uselist:
                schema["properties"][key] = {
                    "type": "array",
                    "items": generate_json_schema(related_model, processed_models),
                }
            else:
                schema["properties"][key] = generate_json_schema(related_model, processed_models)

    return schema


def _get_json_type(sqlalchemy_type: TypeEngine) -> str:
    """
    Map SQLAlchemy types to JSON schema types.

    Args:
        sqlalchemy_type (TypeEngine): The SQLAlchemy type.

    Returns:
        str: The corresponding JSON schema type.
    """
    if isinstance(sqlalchemy_type, Integer):
        return "integer"
    elif isinstance(sqlalchemy_type, Float):
        return "number"
    elif isinstance(sqlalchemy_type, String):
        return "string"
    elif isinstance(sqlalchemy_type, Boolean):
        return "boolean"
    elif isinstance(sqlalchemy_type, JSON):
        return "object"
    elif isinstance(sqlalchemy_type, ARRAY):
        return "array"
    else:
        return "string"


# Base and model classes


class TeamBase(SQLModel):
    """
    Base class for Team model, containing common fields.

    Attributes:
        name (str): The name of the team.
        headquarters (Optional[str]): The headquarters location of the team.
    """

    name: str = Field(index=True)
    headquarters: Optional[str] = Field(default=None)

    model_config = {"from_attributes": True}


class HeroBase(SQLModel):
    """
    Base class for Hero model, containing common fields.

    Attributes:
        name (str): The name of the hero.
        secret_name (Optional[str]): The secret name of the hero.
        age (Optional[int]): The age of the hero.
        team_id (Optional[int]): The ID of the team the hero belongs to.
    """

    name: str = Field(index=True)
    secret_name: Optional[str] = Field(default=None)
    age: Optional[int] = Field(default=None, index=True)
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")

    model_config = {"from_attributes": True}


class Team(TeamBase, table=True):
    """
    Team model representing a team in the database.

    Attributes:
        id (Optional[int]): The primary key of the team.
        heroes (List["Hero"]): List of heroes belonging to the team.
    """

    id: Optional[int] = Field(default=None, primary_key=True)
    heroes: List["Hero"] = Relationship(
        sa_relationship_kwargs={"lazy": "selectin"},
        back_populates="team",
    )


class Hero(HeroBase, table=True):
    """
    Hero model representing a hero in the database.

    Attributes:
        id (Optional[int]): The primary key of the hero.
        team (Optional["Team"]): The team the hero belongs to.
    """

    id: Optional[int] = Field(default=None, primary_key=True)
    team: Optional["Team"] = Relationship(back_populates="heroes")


if __name__ == "__main__":
    from pprint import pprint

    # Example data for creating a Hero instance
    hero_data = {
        "name": "Jason",
        "secret_name": "Tree",
        "team": {"name": "A-Team", "headquarters": "Austin, TX"},
    }

    # Example data for creating another Hero instance with multiple teams
    another_hero_data = {
        "name": "Sophia",
        "secret_name": "Shadow",
        "age": 28,
        "team": {"name": "Shadow Squad", "headquarters": "New York, NY"},
    }

    # Create Hero instances
    hero = create_instance(Hero, hero_data)
    another_hero = create_instance(Hero, another_hero_data)

    # Dump the fully nested instances
    print("Dumped Hero Instance:")
    pprint(dump_instance(hero), width=80)
    print("\nDumped Another Hero Instance:")
    pprint(dump_instance(another_hero), width=80)

    # Generate and print the JSON schema for the Hero model
    print("\nGenerated JSON Schema for Hero Model:")
    pprint(generate_json_schema(Hero), width=80)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment