I ran into a problem while developing some cool tools. I had a nested Pydantic structure that I needed to migrate to SQLModel. But, I needed to retain the ability to emit fully nested schema and models -- which was lost. So, I had to write a few helper functions and models.
In the realm of modern web development and data-driven applications, managing complex data structures and their relationships can be a daunting task. SQLModel, a powerful library that bridges the gap between SQLAlchemy and Pydantic, offers an elegant solution for defining and interacting with SQL databases using Python. In this small write up, we dive into a practical demonstration of leveraging SQLModel to emit fully nested Python schemas and models. Surprisingly, this doesn't work out of the box.
Our focus is on a versatile script that illustrates how to dynamically generate and manipulate nested data structures. By defining SQLModel-based types, we can seamlessly map these models to corresponding database tables while retaining the full richness of Python's type system. This script demonstrates how to:
- Create Instances: Dynamically instantiate models from dictionaries, including handling nested relationships and collections.
- Dump Instances: Serialize complex model instances into nested dictionaries, preserving all relationships and preventing infinite recursion.
- Generate JSON Schemas: Automatically produce JSON schemas for SQLModel classes, capturing both the data structure and constraints.
Whether you're building RESTful APIs, data validation layers, or simply seeking a robust way to manage relational data in Python, this will guide you through creating and leveraging fully nested schemas and models using SQLModel. By the end of this tutorial, you should have a better understanding of how to use SQLModel to handle complex data relationships.
Sadly, this is longer than I expected it to be, but as short as I could make it while maintaining correctness and examples.
When working with SQLModel, it’s crucial to ensure that nested structures are preserved in both schema generation and model instances. This involves several key practices and improvements over the initial migration from Pydantic models:
-
Consistent Use of Primary Keys: SQLModel requires explicit primary key definitions to manage data consistency and operations. Unlike Pydantic, where primary keys might not be needed, SQLModel ensures robust database interactions.
-
Implementation of Read Models: For each primary model, a corresponding read model facilitates data serialization and deserialization. Read models use Pydantic's validation and ORM mode to ensure compatibility and flexible data handling.
Let's discuss something important. Compare the following to class definitions:
class TeamBase(SQLModel):
...
as compared to this
class Team(TeamBase, table=True):
...
The first is effectively a Pydantic model, something used for data modeling and schema validation. So, your data fields go here. You don't need any database IDs, primary keys, or relationships. Just the data.
The second, with table=True
is going to be saved to disk. It's a class that now knows about SQLAlchemy. So, you add your DB IDs, primary keys, relationships, etc.
-
Model Configuration: Setting
model_config
withfrom_attributes=True
ensures proper attribute mapping and validation. This enables you to fully hydrate the model when it's read from disk. (This used to be calledorm_mode
.) -
Field Descriptions for GenAI Integration: If utilizing generative AI or similar technologies, providing field descriptions in read models enhances model clarity and usability.
Below is the code implementation that demonstrates the process of creating, dumping, and generating JSON schemas for SQLModel-based types, with proper handling of nested structures:
from typing import Any, Dict, List, Optional, Set, Type
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.inspection import inspect
from sqlalchemy.sql.type_api import TypeEngine
from sqlmodel import JSON, Boolean, Field, Float, Integer, Relationship, SQLModel, String
def create_instance(model: Type[SQLModel], data: Dict[str, Any]) -> SQLModel:
"""
Create an instance of a SQLModel using the provided data dictionary.
This function dynamically maps nested dictionaries to corresponding ORM models.
Args:
model (Type[SQLModel]): The SQLModel class to instantiate.
data (Dict[str, Any]): A dictionary containing the data to populate the model.
Returns:
SQLModel: An instance of the model populated with the provided data.
Example:
hero = create_instance(Hero, {"name": "Jason", "secret_name": "Tree"})
"""
mapper = inspect(model)
columns = {column.key: column for column in mapper.columns}
relationships = {rel.key: rel for rel in mapper.relationships}
instance_data = {}
for key, value in (data or {}).items():
if key in columns:
instance_data[key] = value
elif key in relationships:
related_model = relationships[key].mapper.class_
if isinstance(value, list):
instance_data[key] = [
create_instance(related_model, v) for v in value if v is not None
]
elif value is not None:
instance_data[key] = create_instance(related_model, value)
else:
instance_data[key] = [] if relationships[key].uselist else None
return model(**instance_data)
def dump_instance(instance: SQLModel, processed_ids: Optional[Set[int]] = None) -> Dict[str, Any]:
"""
Recursively serialize a SQLModel instance into a dictionary, including nested relationships.
Args:
instance (SQLModel): The SQLModel instance to serialize.
processed_ids (Optional[Set[int]]): Set of processed instance IDs to avoid infinite recursion.
Returns:
Dict[str, Any]: A dictionary representation of the instance.
Example:
data_dict = dump_instance(hero_instance)
"""
if processed_ids is None:
processed_ids = set()
instance_id = id(instance)
if instance_id in processed_ids:
return {} # Prevent infinite recursion
processed_ids.add(instance_id)
mapper = inspect(instance.__class__)
columns = {column.key: column for column in mapper.columns}
relationships = {rel.key: rel for rel in mapper.relationships}
data = {}
for key in columns:
data[key] = getattr(instance, key)
for key, relationship in relationships.items():
related_data = getattr(instance, key)
if relationship.uselist:
data[key] = [
dump_instance(rel_instance, processed_ids) for rel_instance in related_data
]
else:
data[key] = dump_instance(related_data, processed_ids) if related_data else None
return data
def generate_json_schema(
model: Type[SQLModel], processed_models: Optional[Set[str]] = None
) -> Dict[str, Any]:
"""
Generate a JSON schema for a SQLModel class, including nested relationships.
Args:
model (Type[SQLModel]): The SQLModel class to generate the schema for.
processed_models (Optional[Set[str]]): Set of already processed models to prevent recursion.
Returns:
Dict[str, Any]: The JSON schema as a dictionary.
Example:
schema = generate_json_schema(Hero)
"""
if processed_models is None:
processed_models = set()
mapper = inspect(model)
columns = {column.key: column for column in mapper.columns}
relationships = {rel.key: rel for rel in mapper.relationships}
schema = {"title": model.__name__, "type": "object", "properties": {}, "required": []}
processed_models.add(model.__name__)
for key, column in columns.items():
if key == "id":
continue
field_info = model.__fields__.get(key)
json_type = _get_json_type(column.type)
property_schema = {"type": json_type}
if field_info and field_info.description:
property_schema["description"] = field_info.description
schema["properties"][key] = property_schema
if not column.nullable:
schema["required"].append(key)
for key, relationship in relationships.items():
related_model = relationship.mapper.class_
if related_model.__name__ not in processed_models:
if relationship.uselist:
schema["properties"][key] = {
"type": "array",
"items": generate_json_schema(related_model, processed_models),
}
else:
schema["properties"][key] = generate_json_schema(related_model, processed_models)
return schema
def _get_json_type(sqlalchemy_type: TypeEngine) -> str:
"""
Map SQLAlchemy types to JSON schema types.
Args:
sqlalchemy_type (TypeEngine): The SQLAlchemy type.
Returns:
str: The corresponding JSON schema type.
"""
if isinstance(sqlalchemy_type, Integer):
return "integer"
elif isinstance(sqlalchemy_type, Float):
return "number"
elif isinstance(sqlalchemy_type, String):
return "string"
elif isinstance(sqlalchemy_type, Boolean):
return "boolean"
elif isinstance(sqlalchemy_type, JSON):
return "object"
elif isinstance(sqlalchemy_type, ARRAY):
return "array"
else:
return "string"
# Base and model classes
class TeamBase(SQLModel):
"""
Base class for Team model, containing common fields.
Attributes:
name (str): The name of the team.
headquarters (Optional[str]): The headquarters location of the team.
"""
name: str = Field(index=True)
headquarters: Optional[str] = Field(default=None)
model_config = {"from_attributes": True}
class HeroBase(SQLModel):
"""
Base class for Hero model, containing common fields.
Attributes:
name (str): The name of the hero.
secret_name (Optional[str]): The secret name of the hero.
age (Optional[int]): The age of the hero.
team_id (Optional[int]): The ID of the team the hero belongs to.
"""
name: str = Field(index=True)
secret_name: Optional[str] = Field(default=None)
age: Optional[int] = Field(default=None, index=True)
team_id: Optional[int] = Field(default=None, foreign_key="team.id")
model_config = {"from_attributes": True}
class Team(TeamBase, table=True):
"""
Team model representing a team in the database.
Attributes:
id (Optional[int]): The primary key of the team.
heroes (List["Hero"]): List of heroes belonging to the team.
"""
id: Optional[int] = Field(default=None, primary_key=True)
heroes: List["Hero"] = Relationship(
sa_relationship_kwargs={"lazy": "selectin"},
back_populates="team",
)
class Hero(HeroBase, table=True):
"""
Hero model representing a hero in the database.
Attributes:
id (Optional[int]): The primary key of the hero.
team (Optional["Team"]): The team the hero belongs to.
"""
id: Optional[int] = Field(default=None, primary_key=True)
team: Optional["Team"] = Relationship(back_populates="heroes")
if __name__ == "__main__":
from pprint import pprint
# Example data for creating a Hero instance
hero_data = {
"name": "Jason",
"secret_name": "Tree",
"team": {"name": "A-Team", "headquarters": "Austin, TX"},
}
# Example data for creating another Hero instance with multiple teams
another_hero_data = {
"name": "Sophia",
"secret_name": "Shadow",
"age": 28,
"team": {"name": "Shadow Squad", "headquarters": "New York, NY"},
}
# Create Hero instances
hero = create_instance(Hero, hero_data)
another_hero = create_instance(Hero, another_hero_data)
# Dump the fully nested instances
print("Dumped Hero Instance:")
pprint(dump_instance(hero), width=80)
print("\nDumped Another Hero Instance:")
pprint(dump_instance(another_hero), width=80)
# Generate and print the JSON schema for the Hero model
print("\nGenerated JSON Schema for Hero Model:")
pprint(generate_json_schema(Hero), width=80)