title

author

date

styles

Spyglass Workshop 2025

Chris Broz

08/2025

style
dracula

Calibration Slide

Window must be this wide ─────────────────────────────────────────────────────►

And this tall
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
▼
For use with `lookatme`, a terminal-based presentation tool.

Overview

This session will cover ...

⭕ DataJoint Infrastructure
⭕ DataJoint Table Tiers and Declaration Syntax
⭕ Spyglass Table Types, including Merge Tables
⭕ Creating a Custom Pipeline
⭕ Common DataJoint Errors (time permitting)

Overview

This session will cover ...

👀 DataJoint Infrastructure
⭕ DataJoint Table Tiers and Declaration Syntax
⭕ Spyglass Table Types, including Merge Tables
⭕ Creating a Custom Pipeline
⭕ Common DataJoint Errors (time permitting)

Infrastructure

Pieces

What are we doing? What are we connecting to?

        ┌────────────────┐
        │   OS           │
        │┌──────────────┐│
        ││  Python      ││
        ││┌────────────┐││
        │││ Jupyter    │││
        │││┌──────────┐│││
        ││││Some Code ││││
        │││└──────────┘│││
        ││└────────────┘││
        │└──────────────┘│
        └────────────────┘

Infrastructure

Pieces

What are we doing? What are we connecting to?

        ┌────────────────┐
        │   OS           │
        │┌──────────────┐│
        ││  Python      ││
        ││┌────────────┐││
        │││ Jupyter    │││
        │││┌──────────┐│││  ┌────────┐
        ││││DataJoint─┼┼┼┼──┼─> MySQL│
        │││└──────────┘│││  └────────┘
        ││└────────────┘││
        │└──────────────┘│
        └────────────────┘

Infrastructure

DJ's Role

        ┌──────────┐  ┌────────┐
        │DataJoint─┼──┼─> MySQL│
        └──────────┘  └────────┘

When importing a table, DataJoint...

Connects to the SQL server using credentials
Checks to see if that table exists
If not, declares it

Infrastructure

SQL's Role

A given experiment might have separate spreadsheets for ...

Subjects
Sessions
Analysis parameters
Analysis

With the experimenter remembering the relationships between them.

Infrastructure

SQL's Role

Generate tables with relationships to one another

     ┌───────────────┐
     │SUBJECT        │
     │*subj_id*, name│
     └───────┬───────┘
             │
             ▼
     ┌─────────────────────────────┐  ┌───────────────────────────┐
     │SESSION                      │  │ PARAMETERS                │
     │*subj_id*, *session_id*, time│  │ *param_id*, param1, param2│
     └─────────────┬───────────────┘  └────────────┬──────────────┘
                   │                               │
                   ▼                               ▼
                 ┌───────────────────────────────────┐
                 │ANALYSIS                           │
                 │*subj_id*, *session_id*, *param_id*│
                 └───────────────────────────────────┘

Infrastructure

SQL's Role

Terminology:

Downstream: direction of data flow (vs. Upstream)
Primary keys: unique identifiers for each row (here, marked by *)
Secondary keys: fields for each row, but may repeat values.
Foreign keys: references to primary keys in other tables (here, subj_id)

     ┌───────────────┐
     │SUBJECT        │
     │*subj_id*, name│
     └───────┬───────┘
             ▼
     ┌─────────────────────────────┐
     │SESSION                      │
     │*subj_id*, *session_id*, time│
     └─────────────────────────────┘

Infrastructure

SQL's Role

Terminology:

Downstream: direction of data flow (vs. Upstream)
Primary keys: unique identifiers for each row (here, marked by *)
Secondary keys: fields for each row, but may repeat values.
Foreign keys: references to primary keys in other tables (here, subj_id)

Database design is the process of mapping our conceptual model of the data to a set of tables, relationships, fields, and contents.

Infrastructure

Notes

SQL is a programming language for relational databases.

At its core, DataJoint is just a mapping between Python and SQL.
- Python commands -> SQL commands
- Python functions -> prescribed processing steps

DataJoint ...
- is opinionated about how to structure the databases
- limits the full power of SQL in way that promotes good practices.

Python stores ...
- A copy of table definitions, that may be out of sync with the database.
- Methods for processing data, that may be out of sync with existing data.

Good data provenance requires good version control and documentation to keep these in sync.

Overview

This session will cover ...

✅ DataJoint Infrastructure
👀 DataJoint Table Tiers and Declaration Syntax
👀 Spyglass Table Types, including Merge Tables
⭕ Creating a Custom Pipeline
⭕ Common DataJoint Errors (time permitting)

Python Structure

A DataJoint py script will have...

Front matter
1. Imports
2. Schema declaration
Tables
1. Class inheritance
2. Table types
3. Definitions
4. Methods

Python Structure

A DataJoint py script will have...

Front matter
1. Imports - pulling code from other files
2. Schema declaration - telling DataJoint/SQL where the tables will live
Tables
1. Class inheritance
2. Table types
3. Definitions
4. Methods

Front matter

At the beginning of the schema file, you'll find ...

Imports
- Package imports - full package
- Individual imports - part of a package
- Aliased imports - full or part, renamed
- Relative imports - full or part, from adjacent file
Schema declaration

import os # package import

from typing import Union  # Individual class import
from itertools import chain as iter_chain  # Aliased function import

import datajoint as dj  # Aliased package import
from spyglass.common import Nwbfile  # noqa: F401 # Individual package import, quiet linter

from ..utils.schema_prefix import SCHEMA_PREFIX  # Relative import

schema = dj.schema(SCHEMA_PREFIX + "_example")  # Schema declaration

Front matter

Schema prefix

schema = dj.schema("username_example")  # Schema declaration

Database permissions are usually prefix-based
Spyglass adds to 'shared' prefixes (e.g., common), which it relies on for advanced features
Adding to shared prefixes can cause problems.
Instead, use your username or ask database admin about a project prefix

spyglass.common.VideoFile.full_table_name == "`common_behav`.`_video_file`"

Python Structure

A DataJoint py script will have...

Front matter
1. Imports
2. Schema declaration
Tables
1. Class inheritance - expanding classes with imported code
2. Table types - general categories of the table
3. Definitions - column structure of the table
4. Methods - additional tools for this specific table

Python Structure

Table syntax

Class inheritance

The parentheses in the class definition indicate what the class inherits from.

@schema
class ExampleTable(SpyglassMixin, dj.Manual):
    pass

This table is ...

A DataJoint Manual table, which is a table that is manually populated.
A SpyglassMixin table, which provides some Spyglass-specific features.

Python Structure

Table syntax

Table types

DataJoint types:
- Manual tables are manually populated.
- Lookup tables can be populated on declaration, and rarely change.
- Computed tables are populated by a method runs computations on upstream entries.
- Imported tables are populated by a method that imports data from another source.
- Part tables are used to store data that is conceptually part of another table.

Spyglass conceptual types:
- Data tables are the starting point for an analysis. Either ingested or analysis output.
- Parameter tables (often dj.Lookup) store parameters for analysis.
- Selection tables (dj.Manual) store pairings of parameters and data to be analyzed.
- Analysis tables (often dj.Computed) store the results of analysis.
- Merge tables combine data from multiple pipeline versions.

Table types

DataJoint types: Manual, Lookup, Computed/Imported, Part
Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge

Spyglass Type	DataJoint Type
Data	Manual, Imported, Computed
Parameter	Lookup, can be Manual
Selection	Manual
Analysis	Computed
Merge	'Merge', with Parts

Any table can have parts for 1-to-many relationships
Merge tables have a special class with 1 part per merged pipeline

Table types

DataJoint types: Manual, Lookup, Computed/Imported, Part
Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge

               ┌────┐
               │Data│
               └─┬──┘
    ┌────────────┼───┐
    │Schema      │   │
    │            │   │
    │┌─────────┐ │   │
    ││Parameter│ │   │
    │└────┬────┘ │   │
    │     ▼      ▼   │
    │    ┌─────────┐ │
    │    │Selection│ │
    │    └────┬────┘ │
    │         ▼      │
    │    ┌─────────┐ │
    │    │Analysis │ │
    │    └─────────┘ │
    └────────────────┘

Table types

DataJoint types: Manual, Lookup, Computed/Imported, Part
Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge

               ┌────┐
               │Data│
               └─┬──┘   ┌─────────────┐
    ┌────────────┼───┐  │  ┌──────────┼─────┐
    │Schema      │   │  │  │Merge     │     │
    │            │   │  │  │Schema    │     │
    │┌─────────┐ │   │  │  │          │     │
    ││Parameter│ │   │  │  │┌─────┐   │     │
    │└────┬────┘ │   │  │  ││Merge│   │     │
    │     ▼      ▼   │  │  │└───┬─┘   │     │
    │    ┌─────────┐ │  │  │    ▼     ▼     │
    │    │Selection│ │  │  │  ┌──────────┐  │
    │    └────┬────┘ │  │  │  │Merge Part│  │
    │         ▼      │  │  │  └──────────┘  │
    │    ┌─────────┐ │  │  │                │
    │    │Analysis │ │  │  │                │
    │    └────┬────┘ │  │  │                │
    └─────────┼──────┘  │  └────────────────┘
              └─────────┘

NOTE: for diagrams of existing tables, see dj.Diagram

Python Structure

Table syntax

Definitions

Each table can have a docstring that describes the table, and must have a definition attribute that contains the SQL-like table definition.

# comments are used to describe the table and its columns.
--- separates the primary key columns from the data columns.
field : datatype defines a column using a SQL datatype
-> indicates a foreign key reference to another table.

@schema
class ExampleTable(SpyglassMixin, dj.Manual):
    """Table Description""" # also called a doc string

    definition = """ # Table comment used as header
    primary_key1 : uuid # randomized string
    primary_key2 : int  # integer
    ---
    secondary_key1 : varchar(32) # string of max length 32
    secondary_key2 : blob # anything, usually a python dictionary
    -> Nwbfile # Foreign key reference, inherit primary key of this table
    -> Subject.proj(actual_id='subject_id') # Rename column
    """

Note: Fk-ref'd tables must be in script environment

Python Structure

Table syntax

Methods

Many Spyglass tables have methods that provide functionality for the pipeline.

Properties don't have arguments, can be called as Table.property
Static methods don't need any info from the table
Class methods
Basic Methods

@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
    ...

    @property  # Static information, Table.property
    def pk(self):
        return "subject_id"

    @staticmethod  # Basic func with no reference to self instance
    def _subj_dict(id: UUID):
        """Return the subject dict"""
        return {"subject_id": id}

Python Structure

Table syntax

Methods

Properties don't have arguments, can be called as Table.property
Static methods don't need any info from the table
Class methods don't need instance information
Basic Methods assume access to class instance

@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
    ...

    @classmethod  # Class, not instance. Table.func(), not Table().func()
    def example(cls, argument=None):  # Default value
        pass  # Not doing anything

    def blind_subjs(self, restriction: Union[str, dict]):
        """Import all subjects selected by the restriction"""
        insert_keys = [
            {
                **self._subj_dict(dj.hash.key_hash(key)),
                "actual_id": key["subject_id"],
            }
            for key in (Subject & restriction).fetch("KEY")
        ]
        self.insert(insert_keys, skip_duplicates=True)

Python Structure

Table syntax

Example Params Table

@schema
class MyParams(SpyglassMixin, dj.Lookup):
    definition = """
    param_name: varchar(32)
    ---
    params: blob
    """
    contents = [  # Contents inserted on declaration
        ["example1", {"A": 1, "B": 2}],
        ["example2", {"A": 3, "B": 4}],
    ]

    @classmethod
    def insert_default(cls):
        cls.insert(cls.contents, skip_duplicates=True)

Example Selection/Analysis Tables

@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
    definition = """
    -> SubjBlinded
    -> MyParams
    """

@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> MyAnalysisSelection
    """

    class MyAnalysisPart(SpyglassMixin, dj.Part):
        definition = """
        -> MyAnalysis
        ---
        result: int
        """

    def make(self, key):
        ...
        self.insert1(key)
        self.MyAnalysisPart.insert1({**key, "result": 1})

Run the make method to populate the table with MyAnalysis().populate().

Overview

This session will cover ...

✅ DataJoint Infrastructure
✅ DataJoint Table Tiers and Declaration Syntax
✅ Spyglass Table Types, including Merge Tables
👀 Creating a Custom Pipeline
⭕ Common DataJoint Errors (time permitting)

Custom pipelines

Designing your own involves ...

Identifying pieces of your analysis ...
1. Parameters
2. Data
3. Selection (optional)
4. Analysis
Deciding foreign key references
Using Merge Tables

Custom pipelines

Analysis Structure

Parameters

Key-value pairs that define the analysis.
Choice of explicit fields or blob for arbitrary data.
- Fields are more readable and searchable.
- blob is more flexible and easier to add new parameters.

@schema
class MyParams(SpyglassMixin, dj.Lookup):
    definition = """
    param_name: varchar(32)
    ---
    params: blob
    """
    contents = [  # Contents inserted on declaration
        ["example1", {"A": 1, "B": 2}],
        ["example2", {"A": 3, "B": 4}],
    ]

Custom pipelines

Analysis Structure

Data (1/2)

Input data might ...
- Include minor preprocessing within your pipeline (e.g., blinding)
- Be raw (e.g., common.Raw) or processed by another pipeline (e.g., lfp.LFPOutput), and referenced by a Selection table.
Data tables might be structured differently depending on the unit of analysis.
- One-to-one with existing tables: link directly to Selection table.
- One-to-many or many-to-one: use Part tables.

@schema
class SubjBlinded(SpyglassMixin, dj.Manual): # Preprocessed data
    """Blinded subject table."""

    definition = """
    subject_id: uuid # id
    ---
    -> Subject.proj(actual_id='subject_id')
    """

    def insert(self, data):
        """Insert data with blinded subject IDs."""
        ...

Custom pipelines

Analysis Structure

Data (2/2)

Input data might ...
- Include minor preprocessing within your pipeline (e.g., blinding)
- Be raw (e.g., common.Raw) or processed by another pipeline (e.g., lfp.LFPOutput), and referenced by a Selection table.
Data tables might be structured differently depending on the unit of analysis.
- One-to-one with existing tables: link directly to upstream table.
- Many-to-one: use Part tables.

@schema
class MyGrouping(SpyglassMixin, dj.Manual): # many-to-one
    definition = """
    group_id: int
    """

    class MyUnit(SpyglassMixinPart):
        definition = """
        -> MyGrouping
        -> UpstreamUnit
        """

Custom pipelines

Analysis Structure

Selection

make will pair all combinations of upstream primary keys.
A Selection table can be used to filter the upstream data, as a staging table for analysis.

@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
    definition = """
    -> SubjBlinded
    -> MyParams
    """

Custom pipelines

Analysis Structure

Analysis

make method populates the table, running core analysis.
MyAnalysis.populate() will run the analysis on all combinations of upstream primary keys.
By convention, results are stored as NWB files, with a reference to the AnalysisNwbfile table.
Postpone insert calls until the end of the make (see also 'atomicity').

@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> MyAnalysisSelection
    ---
    -> AnalysisNwbfile
    """

    def make(self, key):
        params = (MyParams & key).fetch1("params")
        ...
        inserted = dict(key, nwb_file_name=new_file)
        self.insert1(key)

Custom pipelines

Foreign Key References

How to reference upstream tables?

Primary vs Secondary and Foreign Key vs Field are orthogonal concepts.

A primary key (pk) is a unique identifier for that table.
A secondary key (sk) field is data associated with the primary key.
A foreign pk is likely another processing step for that data.
A foreign sk is likely other data associated with that primary key, like AnalysisNwbfile.

Custom pipelines

Foreign Key References

Foreign Primary Key

As sole primary key indicates a one-to-one relationship
Likely another processing step

definition = """
-> UpstreamTable
---
anything: varchar(32)
"""

As one of multiple primary keys indicates pairing of upstream data.

definition = """
-> UpstreamTable1
-> UpstreamTable2
"""

Custom pipelines

Foreign Key References

Foreign Secondary Key

As a secondary key indicates either ...

1. Linking to non-unique fields. Does not uniquely identify the row

definition = """
-> UpstreamTable1
---
-> AnalysisNwbfile
"""

2. An aliasing (or 'burying') of the primary key(s), useful for SQL limitations

Too many inherited pks crowds a table or prevents it from being declared
For convenience navigating these, see 'long distance restrictions'

definition = """
my_id: int
---
-> UpstreamTable1
-> UpstreamTable2
"""

Custom pipelines

Using Merge Tables

Merge tables signify that an analysis can be done in multiple ways.
When attaching downstream, you can either ...
- Assume your data comes from one path, or 'source'.
- Allow for multiple sources and let merge-specific functions handle it.

@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> SomeMergeTable
    -> MyParams
    ---
    -> AnalysisNwbfile
    """

    def make(self, key):
        params = (MyParams & key).fetch1("params")
        parent = SomeMergeTable().merge_get_parent(key)
        ...
        self.insert1(key)
        self.MyAnalysisPart.insert1({**key, "result": 1})

See docs for more information on available methods. Existing tables also provide a good reference.

Overview

This session will cover ...

✅ DataJoint Infrastructure
✅ DataJoint Table Tiers and Declaration Syntax
✅ Spyglass Table Types, including Merge Tables
✅ Creating a Custom Pipeline
👀 Common DataJoint Errors (time permitting)

Common Errors

General how-to
Integrity Error
Permission Error
TypeError
KeyError
DataJointError

Common Errors

How to: Debug mode

An error stack has multiple 'frames' with separately defined variables
To figure out what happened, you may need to 'hop into' a given frame

To enter into debug mode, you can either ...

Use IPython/Jupyter and run %debug upon hitting an error
Use the VSCode extension
Add a breakpoint to your code ...

__import__("pdb").set_trace()

In this mode, you can enter ...

u and d to move up and down the stack
l to list the code around the current line
q to quit the debugger
c to continue running the code
h for help, which will list all the commands

Integrity

IntegrityError: Cannot add or update a child row: a foreign key constraint fails
  (`schema`.`_table`, CONSTRAINT `_table_ibfk_1`
  FOREIGN KEY (`parent_field`)
  REFERENCES `other_schema`.`parent_name` (`parent_field`)
  ON DELETE RESTRICT ON UPDATE CASCADE)

During insert: some part of the key doesn't exist in the parent
The full_table_name after REFERENCES indicates the issue

For a SpyglassMixin table you can run find_insert_fail
- If any are empty, you know you need to insert there first.
- Caution: may not work if there are proj aliases

my_key = dict(value=key)  # whatever you're inserting
MyTable.insert1(my_key)  # error here
parents = MyTable().find_insert_fail(my_key)

Common Errors

Permission

('Insufficient privileges.',
  "INSERT command denied to user 'username'@'127.0.0.1' for table '_table_name'",
  'INSERT INTO `schema_name`.`table_name`(`field1`,`field2`) VALUES (%s,%s)')

This is a MySQL error that means that either ...

You don't have access to the command you're trying to run (e.g., INSERT)
You don't have access to this command on the schema you're trying to run it on

To see what permissions you have, you can run the following ...

dj.conn().query("SHOW GRANTS FOR CURRENT_USER();").fetchall()

Permissions can only be modified by a database administrator

Common Errors

Type

TypeError: example_function() got an unexpected keyword argument 'this_arg'

Results from calling a function (e.g., example_function(this_arg=5)) with mismatched data
Check the function's accepted arguments by running help(example_function).

TypeError: 'NoneType' object is not iterable

The called function can't figure out how to run for over this object (e.g., for item in variable: ...)
Check the type of the variable with type(variable)

Common Errors

KeyError

KeyError: 'field_name'

Trying to access a key in a dictionary that doesn't exist (e.g., mydict['bad_key'])
Check the keys of the dictionary by running mydict.keys()
For your own code, you can set a default with mydict.get('field', default).

Common Errors

DataJoint

DataJointError("Attempt to delete part table {part} before deleting from its master {master} first.")

The delete process found a part table entry referencing the attempted delete
Delete the master table entry first

Overview

This session will cover ...

✅ DataJoint Infrastructure
✅ DataJoint Table Tiers and Declaration Syntax
✅ Spyglass Table Types, including Merge Tables
✅ Creating a Custom Pipeline
✅ Common DataJoint Errors (time permitting)

This is a presentation designed for use with lookatme:

pip install lookatme
lookatme FileName.md --live

CBroz1/SpyglassWorkshopSlides_2025.md

Calibration Slide

Overview

Overview

Infrastructure

Pieces

Infrastructure

Pieces

Infrastructure

DJ's Role

Infrastructure

SQL's Role

Infrastructure

SQL's Role

Infrastructure

SQL's Role

Infrastructure

SQL's Role

Infrastructure

Notes

Overview

Python Structure

Python Structure

Front matter

Front matter

Schema prefix

Python Structure

Python Structure

Table syntax

Class inheritance

Python Structure

Table syntax

Table types

Table types

Table types

Table types

Python Structure

Table syntax

Definitions

Python Structure

Table syntax

Methods

Python Structure

Table syntax

Methods

Python Structure

Table syntax

Example Params Table

Example Selection/Analysis Tables

Overview

Custom pipelines

Custom pipelines

Analysis Structure

Parameters

Custom pipelines

Analysis Structure

Data (1/2)

Custom pipelines

Analysis Structure

Data (2/2)

Custom pipelines

Analysis Structure

Selection

Custom pipelines

Analysis Structure

Analysis

Custom pipelines

Foreign Key References

Custom pipelines

Foreign Key References

Foreign Primary Key

Custom pipelines

Foreign Key References

Foreign Secondary Key

Custom pipelines

Using Merge Tables

Overview

Common Errors

Common Errors

How to: Debug mode