Skip to content

Instantly share code, notes, and snippets.

@danbri
Created December 10, 2025 22:00
Show Gist options
  • Select an option

  • Save danbri/2ed942c2cfe0cf3d69ca2bf2ea46008d to your computer and use it in GitHub Desktop.

Select an option

Save danbri/2ed942c2cfe0cf3d69ca2bf2ea46008d to your computer and use it in GitHub Desktop.
SchemaOrg repo notes (Claude Code)

Schema.org Repository Study Notes

Generated by Claude Code - 2025-12-10


Overview

This is the Schema.org project repository, containing all schemas, examples, and software used to publish the Schema.org vocabulary at https://schema.org/. Schema.org is a collaborative vocabulary for structured data on the web, originally founded by Google, Microsoft, Yahoo, and Yandex in 2011.

Current Version: 29.4 (released 2025-12-08)


Repository Structure

schemaorg/
├── data/                 # Schema definitions (RDF/Turtle) and examples
├── software/             # Build system and utilities (Python)
├── templates/            # Jinja2 templates for HTML generation
├── docs/                 # Static documentation and generated pages
├── .github/              # CI/CD workflows
├── versions.json         # Version tracking
├── README.md             # Project documentation
└── LICENSE               # Apache 2.0

Data Layer (data/)

Core Schema Files

  • schema.ttl (~11,000 lines) - Main vocabulary definition in W3C RDF Turtle format
  • examples.txt (~11,600 lines) - Multi-format examples (Microdata, RDFa, JSON-LD)
  • mappings.ttl - External vocabulary mappings

Schema Statistics

Category Count
Total Terms ~1,683
Classes (rdfs:Class) 635
Properties (rdf:Property) 926
Extensions 150+ pending

Extension System (data/ext/)

ext/
├── auto/           # Automotive vocabulary
├── bib/            # Bibliography/publishing (BSDO)
├── health-lifesci/ # Medical/health terms
├── meta/           # Metadata extensions
├── pending/        # Proposed terms (issue-XXXX.ttl)
└── attic/          # Deprecated/retired vocabulary

Each pending extension is tied to a GitHub issue (e.g., issue-4579.ttl).

Release Archives (data/releases/)

Complete historical releases from v2.0 (2015) through v29.4 (2025):

  • .ttl, .nq, .nt, .rdf - RDF formats
  • .jsonld - JSON-LD context
  • .csv - Property/type tables
  • .owl - OWL ontology
  • .shacl, .shexj - Validation shapes

Software Layer (software/)

Key Components

Build System (software/util/)

Script Purpose
buildsite.py Main orchestrator - full site build
buildfiles.py RDF export generation
buildtermpages.py Term HTML page generation
buildocspages.py Documentation page generation
schema_graph.py RDFlib graph wrapper
runtests.py Test orchestration

Schema Processing (software/SchemaTerms/)

  • sdotermsource.py - Schema loading/caching (singleton pattern)
  • sdoterm.py - Term class hierarchy (Type, Property, Datatype, Enumeration)
  • sdocollaborators.py - Attribution/collaboration data

Example Processing (software/SchemaExamples/)

  • schemaexamples.py - Multi-format example parser and validator

Development Server

  • devserv.py - Flask-based local server (localhost:8080)

Dependencies (requirements.txt)

  • Flask 2.3.2 - Web framework
  • rdflib 6.1.1 - RDF processing
  • Jinja2 3.1.6 - Templating
  • beautifulsoup4 - HTML parsing
  • requests, Pygments, markdown2, colorama

Build Process

Full Build Command

cd software
pip install -r requirements.txt
./util/buildsite.py -a    # Full auto-build

Build Pipeline

1. Load schema graphs (core + extensions)
2. Validate schema consistency
3. Generate term pages (HTML)
4. Generate documentation pages
5. Generate export files (Turtle, JSON-LD, OWL, etc.)
6. Copy static assets
7. Run tests

Build Options

-a, --autobuild      Full build (clears output, builds all)
-c, --clearfirst     Clear output directory first
-e, --examplesnum    Add missing example IDs
-f, --files          Build specific output files
-r, --runtests       Run tests before building
--release            Full release build with snapshot

Templates (templates/)

Jinja2 templates for HTML generation:

  • PageHeader.j2 / PageFooter.j2 - Common page structure
  • macros.j2 - Reusable macros for term linking
  • terms/*.j2 - Term page templates (InfoBlock, PropertiesBlock, etc.)
  • docs/*.j2 - Documentation page templates (Schemas, Home, FullRelease)

Testing

Python Tests (software/tests/)

14 test modules covering:

  • Schema graph validation
  • Term processing
  • Example parsing
  • File generation
  • JSON-LD context

Ruby Tests (software/scripts/)

  • RSpec-based tests with RDF::Reasoner
  • Example format validation

Running Tests

# Python tests
./util/buildsite.py -r

# Ruby tests
cd scripts
bundle install
bundle exec rake

CI/CD (.github/workflows/)

ci_tests.yml

Triggers on push/PR to any branch:

  1. Python 3.x setup
  2. Install dependencies
  3. Build site with validation
  4. Run Python tests
  5. Ruby setup (3.1)
  6. Run RSpec tests

Deployment (software/gcloud/)

Google Cloud AppEngine deployment:

  • deploy2staging.schema.org.sh - Staging
  • deploy2schema.org.sh - Production

Documentation (docs/)

~50+ HTML files including:

  • Core: about.html, developers.html, howwework.html, faq.html
  • Domain guides: hotels.html, automotive.html, news.html, meddocs.html
  • Reference: releases.html (285KB - comprehensive release history)
  • Generated: sitemap.xml, jsonldcontext.jsonld

Key Design Principles

From README.md:

  1. Pragmatic over Pure - Trade elegance for usability
  2. Incremental Evolution - Small, backwards-compatible changes preferred
  3. Consumer-Driven - New schemas need evidence of consuming applications
  4. Collaboration - Integrate with external standards (GoodRelations, IPTC, BBC, etc.)
  5. Local Coherence - Sensible descriptions over global theoretical purity
  6. Simplicity - Avoid over-modeling; Schema.org is necessarily a simplification

Example Format

Examples use a multi-section format:

TYPES: #schema-type
PRE-MARKUP:
[Original HTML]
MICRODATA:
[HTML5 Microdata version]
RDFA:
[RDFa version]
JSON:
[Plain JSON]
JSONLD:
[JSON-LD structured data]

Version History

Release cadence: Monthly/quarterly (~3-6 weeks)

Recent releases:

  • 29.4: 2025-12-08 (current)
  • 29.3: 2025-09-04
  • 29.2: 2025-05-15
  • 29.1: 2025-04-24
  • 29.0: 2025-03-24

Full history in versions.json (54 releases from v2.0 to v29.4).


Collaboration Partners

Schema.org vocabulary integrates designs from:

  • GoodRelations (e-commerce)
  • IPTC rNews (news)
  • BBC/EBU (broadcasting)
  • Music Ontology / MusicBrainz
  • MARC / BibFrame (bibliography)
  • GS1 (product data)
  • Trust Project (credibility)
  • FIBO (financial)
  • Many others...

Contributing

  1. Join W3C Schema.org Community Group
  2. Find/file issues on GitHub
  3. Discuss before substantial PRs
  4. Reference specific issues in PRs
  5. Follow existing patterns and style

Important: Wording changes are easier than spelling changes to types/properties.


Quick Reference

Local Development

# Setup
python3 -m venv venv
source venv/bin/activate
pip install -r software/requirements.txt

# Build
./software/util/buildsite.py -a

# Serve
./software/devserv.py
# Visit http://localhost:8080

Key Files

  • Core schema: data/schema.ttl
  • Examples: data/examples.txt
  • Version info: versions.json
  • Build script: software/util/buildsite.py
  • Dev server: software/devserv.py

Important URLs


Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    DATA LAYER (data/)                        │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ schema.ttl  │  │examples.txt │  │   ext/pending/      │  │
│  │ (RDF/Turtle)│  │ (multi-fmt) │  │   ext/auto/         │  │
│  └──────┬──────┘  └──────┬──────┘  │   ext/bib/          │  │
│         │                │         │   ext/health-lifesci│  │
└─────────┼────────────────┼─────────┴─────────────────────────┘
          │                │
          ▼                ▼
┌─────────────────────────────────────────────────────────────┐
│               SOFTWARE LAYER (software/)                     │
│  ┌─────────────────┐  ┌─────────────────┐                   │
│  │ SchemaTerms/    │  │ SchemaExamples/ │                   │
│  │ - sdotermsource │  │ - schemaexamples│                   │
│  │ - sdoterm       │  └────────┬────────┘                   │
│  └────────┬────────┘           │                            │
│           │                    │                            │
│           ▼                    ▼                            │
│  ┌─────────────────────────────────────┐                    │
│  │          util/buildsite.py          │                    │
│  │  ┌──────────────┬──────────────┐    │                    │
│  │  │buildtermpages│ buildfiles   │    │                    │
│  │  │buildocspages │ sdojsonld    │    │                    │
│  │  └──────────────┴──────────────┘    │                    │
│  └─────────────────────────────────────┘                    │
└─────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│              TEMPLATE LAYER (templates/)                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │ terms/*.j2   │  │ docs/*.j2    │  │ macros.j2    │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
└─────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│              OUTPUT (software/site/ & docs/)                 │
│  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐     │
│  │ .html  │ │ .ttl   │ │ .jsonld│ │ .owl   │ │ .csv   │     │
│  │ pages  │ │ export │ │ context│ │ export │ │ tables │     │
│  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘     │
└─────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│                    DEPLOYMENT                                │
│  ┌──────────────────┐    ┌──────────────────┐               │
│  │ devserv.py       │    │ gcloud/deploy    │               │
│  │ (localhost:8080) │    │ (AppEngine)      │               │
│  └──────────────────┘    └──────────────────┘               │
└─────────────────────────────────────────────────────────────┘

Notes generated by Claude Code studying the schemaorg repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment