Skip to content

Instantly share code, notes, and snippets.

@CBroz1
Last active September 17, 2025 16:04
Show Gist options
  • Select an option

  • Save CBroz1/bd0ba1b12dc530a94d3fac6d0b9e49a9 to your computer and use it in GitHub Desktop.

Select an option

Save CBroz1/bd0ba1b12dc530a94d3fac6d0b9e49a9 to your computer and use it in GitHub Desktop.
Spyglass Workshop Slides 2025
title author date styles
Spyglass Workshop 2025
Chris Broz
08/2025
style
dracula

Calibration Slide

Window must be this wide ─────────────────────────────────────────────────────►

And this tall
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
▼
For use with `lookatme`, a terminal-based presentation tool.

Overview

This session will cover ...

  • ⭕ DataJoint Infrastructure
  • ⭕ DataJoint Table Tiers and Declaration Syntax
  • ⭕ Spyglass Table Types, including Merge Tables
  • ⭕ Creating a Custom Pipeline
  • ⭕ Common DataJoint Errors (time permitting)

Overview

This session will cover ...

  • 👀 DataJoint Infrastructure
  • ⭕ DataJoint Table Tiers and Declaration Syntax
  • ⭕ Spyglass Table Types, including Merge Tables
  • ⭕ Creating a Custom Pipeline
  • ⭕ Common DataJoint Errors (time permitting)

Infrastructure

Pieces

What are we doing? What are we connecting to?

        ┌────────────────┐
        │   OS           │
        │┌──────────────┐│
        ││  Python      ││
        ││┌────────────┐││
        │││ Jupyter    │││
        │││┌──────────┐│││
        ││││Some Code ││││
        │││└──────────┘│││
        ││└────────────┘││
        │└──────────────┘│
        └────────────────┘

Infrastructure

Pieces

What are we doing? What are we connecting to?

        ┌────────────────┐
        │   OS           │
        │┌──────────────┐│
        ││  Python      ││
        ││┌────────────┐││
        │││ Jupyter    │││
        │││┌──────────┐│││  ┌────────┐
        ││││DataJoint─┼┼┼┼──┼─> MySQL│
        │││└──────────┘│││  └────────┘
        ││└────────────┘││
        │└──────────────┘│
        └────────────────┘

Infrastructure

DJ's Role

        ┌──────────┐  ┌────────┐
        │DataJoint─┼──┼─> MySQL│
        └──────────┘  └────────┘

When importing a table, DataJoint...

  1. Connects to the SQL server using credentials
  2. Checks to see if that table exists
  3. If not, declares it

Infrastructure

SQL's Role

A given experiment might have separate spreadsheets for ...

  1. Subjects
  2. Sessions
  3. Analysis parameters
  4. Analysis

With the experimenter remembering the relationships between them.


Infrastructure

SQL's Role

Generate tables with relationships to one another

     ┌───────────────┐
     │SUBJECT        │
     │*subj_id*, name│
     └───────┬───────┘
             │
             ▼
     ┌─────────────────────────────┐  ┌───────────────────────────┐
     │SESSION                      │  │ PARAMETERS                │
     │*subj_id*, *session_id*, time│  │ *param_id*, param1, param2│
     └─────────────┬───────────────┘  └────────────┬──────────────┘
                   │                               │
                   ▼                               ▼
                 ┌───────────────────────────────────┐
                 │ANALYSIS                           │
                 │*subj_id*, *session_id*, *param_id*│
                 └───────────────────────────────────┘

Infrastructure

SQL's Role

Terminology:

  • Downstream: direction of data flow (vs. Upstream)
  • Primary keys: unique identifiers for each row (here, marked by *)
  • Secondary keys: fields for each row, but may repeat values.
  • Foreign keys: references to primary keys in other tables (here, subj_id)
     ┌───────────────┐
     │SUBJECT        │
     │*subj_id*, name│
     └───────┬───────┘
             ▼
     ┌─────────────────────────────┐
     │SESSION                      │
     │*subj_id*, *session_id*, time│
     └─────────────────────────────┘

Infrastructure

SQL's Role

Terminology:

  • Downstream: direction of data flow (vs. Upstream)
  • Primary keys: unique identifiers for each row (here, marked by *)
  • Secondary keys: fields for each row, but may repeat values.
  • Foreign keys: references to primary keys in other tables (here, subj_id)

Database design is the process of mapping our conceptual model of the data to a set of tables, relationships, fields, and contents.


Infrastructure

Notes

  • SQL is a programming language for relational databases.
  • At its core, DataJoint is just a mapping between Python and SQL.
    • Python commands -> SQL commands
    • Python functions -> prescribed processing steps
  • DataJoint ...
    • is opinionated about how to structure the databases
    • limits the full power of SQL in way that promotes good practices.
  • Python stores ...
    • A copy of table definitions, that may be out of sync with the database.
    • Methods for processing data, that may be out of sync with existing data.

Good data provenance requires good version control and documentation to keep these in sync.


Overview

This session will cover ...

  • ✅ DataJoint Infrastructure
  • 👀 DataJoint Table Tiers and Declaration Syntax
  • 👀 Spyglass Table Types, including Merge Tables
  • ⭕ Creating a Custom Pipeline
  • ⭕ Common DataJoint Errors (time permitting)

Python Structure

A DataJoint py script will have...

  1. Front matter
    1. Imports
    2. Schema declaration
  2. Tables
    1. Class inheritance
    2. Table types
    3. Definitions
    4. Methods

Python Structure

A DataJoint py script will have...

  1. Front matter
    1. Imports - pulling code from other files
    2. Schema declaration - telling DataJoint/SQL where the tables will live
  2. Tables
    1. Class inheritance
    2. Table types
    3. Definitions
    4. Methods

Front matter

At the beginning of the schema file, you'll find ...

  • Imports
    • Package imports - full package
    • Individual imports - part of a package
    • Aliased imports - full or part, renamed
    • Relative imports - full or part, from adjacent file
  • Schema declaration
import os # package import

from typing import Union  # Individual class import
from itertools import chain as iter_chain  # Aliased function import

import datajoint as dj  # Aliased package import
from spyglass.common import Nwbfile  # noqa: F401 # Individual package import, quiet linter

from ..utils.schema_prefix import SCHEMA_PREFIX  # Relative import

schema = dj.schema(SCHEMA_PREFIX + "_example")  # Schema declaration

Front matter

Schema prefix

schema = dj.schema("username_example")  # Schema declaration
  • Database permissions are usually prefix-based
  • Spyglass adds to 'shared' prefixes (e.g., common), which it relies on for advanced features
  • Adding to shared prefixes can cause problems.
  • Instead, use your username or ask database admin about a project prefix
spyglass.common.VideoFile.full_table_name == "`common_behav`.`_video_file`"

Python Structure

A DataJoint py script will have...

  1. Front matter
    1. Imports
    2. Schema declaration
  2. Tables
    1. Class inheritance - expanding classes with imported code
    2. Table types - general categories of the table
    3. Definitions - column structure of the table
    4. Methods - additional tools for this specific table

Python Structure

Table syntax

Class inheritance

The parentheses in the class definition indicate what the class inherits from.

@schema
class ExampleTable(SpyglassMixin, dj.Manual):
    pass

This table is ...

  • A DataJoint Manual table, which is a table that is manually populated.
  • A SpyglassMixin table, which provides some Spyglass-specific features.

Python Structure

Table syntax

Table types

  • DataJoint types:
    • Manual tables are manually populated.
    • Lookup tables can be populated on declaration, and rarely change.
    • Computed tables are populated by a method runs computations on upstream entries.
    • Imported tables are populated by a method that imports data from another source.
    • Part tables are used to store data that is conceptually part of another table.
  • Spyglass conceptual types:
    • Data tables are the starting point for an analysis. Either ingested or analysis output.
    • Parameter tables (often dj.Lookup) store parameters for analysis.
    • Selection tables (dj.Manual) store pairings of parameters and data to be analyzed.
    • Analysis tables (often dj.Computed) store the results of analysis.
    • Merge tables combine data from multiple pipeline versions.

Table types

Spyglass Type DataJoint Type
Data Manual, Imported, Computed
Parameter Lookup, can be Manual
Selection Manual
Analysis Computed
Merge 'Merge', with Parts
  • Any table can have parts for 1-to-many relationships
  • Merge tables have a special class with 1 part per merged pipeline

Table types

               ┌────┐
               │Data│
               └─┬──┘
    ┌────────────┼───┐
    │Schema      │   │
    │            │   │
    │┌─────────┐ │   │
    ││Parameter│ │   │
    │└────┬────┘ │   │
    │     ▼      ▼   │
    │    ┌─────────┐ │
    │    │Selection│ │
    │    └────┬────┘ │
    │         ▼      │
    │    ┌─────────┐ │
    │    │Analysis │ │
    │    └─────────┘ │
    └────────────────┘

Table types

               ┌────┐
               │Data│
               └─┬──┘   ┌─────────────┐
    ┌────────────┼───┐  │  ┌──────────┼─────┐
    │Schema      │   │  │  │Merge     │     │
    │            │   │  │  │Schema    │     │
    │┌─────────┐ │   │  │  │          │     │
    ││Parameter│ │   │  │  │┌─────┐   │     │
    │└────┬────┘ │   │  │  ││Merge│   │     │
    │     ▼      ▼   │  │  │└───┬─┘   │     │
    │    ┌─────────┐ │  │  │    ▼     ▼     │
    │    │Selection│ │  │  │  ┌──────────┐  │
    │    └────┬────┘ │  │  │  │Merge Part│  │
    │         ▼      │  │  │  └──────────┘  │
    │    ┌─────────┐ │  │  │                │
    │    │Analysis │ │  │  │                │
    │    └────┬────┘ │  │  │                │
    └─────────┼──────┘  │  └────────────────┘
              └─────────┘

NOTE: for diagrams of existing tables, see dj.Diagram


Python Structure

Table syntax

Definitions

Each table can have a docstring that describes the table, and must have a definition attribute that contains the SQL-like table definition.

  • # comments are used to describe the table and its columns.
  • --- separates the primary key columns from the data columns.
  • field : datatype defines a column using a SQL datatype
  • -> indicates a foreign key reference to another table.
@schema
class ExampleTable(SpyglassMixin, dj.Manual):
    """Table Description""" # also called a doc string

    definition = """ # Table comment used as header
    primary_key1 : uuid # randomized string
    primary_key2 : int  # integer
    ---
    secondary_key1 : varchar(32) # string of max length 32
    secondary_key2 : blob # anything, usually a python dictionary
    -> Nwbfile # Foreign key reference, inherit primary key of this table
    -> Subject.proj(actual_id='subject_id') # Rename column
    """
  • Note: Fk-ref'd tables must be in script environment

Python Structure

Table syntax

Methods

Many Spyglass tables have methods that provide functionality for the pipeline.

  • Properties don't have arguments, can be called as Table.property
  • Static methods don't need any info from the table
  • Class methods
  • Basic Methods
@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
    ...

    @property  # Static information, Table.property
    def pk(self):
        return "subject_id"

    @staticmethod  # Basic func with no reference to self instance
    def _subj_dict(id: UUID):
        """Return the subject dict"""
        return {"subject_id": id}

Python Structure

Table syntax

Methods

  • Properties don't have arguments, can be called as Table.property
  • Static methods don't need any info from the table
  • Class methods don't need instance information
  • Basic Methods assume access to class instance
@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
    ...

    @classmethod  # Class, not instance. Table.func(), not Table().func()
    def example(cls, argument=None):  # Default value
        pass  # Not doing anything

    def blind_subjs(self, restriction: Union[str, dict]):
        """Import all subjects selected by the restriction"""
        insert_keys = [
            {
                **self._subj_dict(dj.hash.key_hash(key)),
                "actual_id": key["subject_id"],
            }
            for key in (Subject & restriction).fetch("KEY")
        ]
        self.insert(insert_keys, skip_duplicates=True)

Python Structure

Table syntax

Example Params Table

@schema
class MyParams(SpyglassMixin, dj.Lookup):
    definition = """
    param_name: varchar(32)
    ---
    params: blob
    """
    contents = [  # Contents inserted on declaration
        ["example1", {"A": 1, "B": 2}],
        ["example2", {"A": 3, "B": 4}],
    ]

    @classmethod
    def insert_default(cls):
        cls.insert(cls.contents, skip_duplicates=True)

Example Selection/Analysis Tables

@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
    definition = """
    -> SubjBlinded
    -> MyParams
    """

@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> MyAnalysisSelection
    """

    class MyAnalysisPart(SpyglassMixin, dj.Part):
        definition = """
        -> MyAnalysis
        ---
        result: int
        """

    def make(self, key):
        ...
        self.insert1(key)
        self.MyAnalysisPart.insert1({**key, "result": 1})

Run the make method to populate the table with MyAnalysis().populate().


Overview

This session will cover ...

  • ✅ DataJoint Infrastructure
  • ✅ DataJoint Table Tiers and Declaration Syntax
  • ✅ Spyglass Table Types, including Merge Tables
  • 👀 Creating a Custom Pipeline
  • ⭕ Common DataJoint Errors (time permitting)

Custom pipelines

Designing your own involves ...

  • Identifying pieces of your analysis ...
    1. Parameters
    2. Data
    3. Selection (optional)
    4. Analysis
  • Deciding foreign key references
  • Using Merge Tables

Custom pipelines

Analysis Structure

Parameters

  • Key-value pairs that define the analysis.
  • Choice of explicit fields or blob for arbitrary data.
    • Fields are more readable and searchable.
    • blob is more flexible and easier to add new parameters.
@schema
class MyParams(SpyglassMixin, dj.Lookup):
    definition = """
    param_name: varchar(32)
    ---
    params: blob
    """
    contents = [  # Contents inserted on declaration
        ["example1", {"A": 1, "B": 2}],
        ["example2", {"A": 3, "B": 4}],
    ]

Custom pipelines

Analysis Structure

Data (1/2)

  • Input data might ...
    • Include minor preprocessing within your pipeline (e.g., blinding)
    • Be raw (e.g., common.Raw) or processed by another pipeline (e.g., lfp.LFPOutput), and referenced by a Selection table.
  • Data tables might be structured differently depending on the unit of analysis.
    • One-to-one with existing tables: link directly to Selection table.
    • One-to-many or many-to-one: use Part tables.
@schema
class SubjBlinded(SpyglassMixin, dj.Manual): # Preprocessed data
    """Blinded subject table."""

    definition = """
    subject_id: uuid # id
    ---
    -> Subject.proj(actual_id='subject_id')
    """

    def insert(self, data):
        """Insert data with blinded subject IDs."""
        ...

Custom pipelines

Analysis Structure

Data (2/2)

  • Input data might ...
    • Include minor preprocessing within your pipeline (e.g., blinding)
    • Be raw (e.g., common.Raw) or processed by another pipeline (e.g., lfp.LFPOutput), and referenced by a Selection table.
  • Data tables might be structured differently depending on the unit of analysis.
    • One-to-one with existing tables: link directly to upstream table.
    • Many-to-one: use Part tables.
@schema
class MyGrouping(SpyglassMixin, dj.Manual): # many-to-one
    definition = """
    group_id: int
    """

    class MyUnit(SpyglassMixinPart):
        definition = """
        -> MyGrouping
        -> UpstreamUnit
        """

Custom pipelines

Analysis Structure

Selection

  • make will pair all combinations of upstream primary keys.
  • A Selection table can be used to filter the upstream data, as a staging table for analysis.
@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
    definition = """
    -> SubjBlinded
    -> MyParams
    """

Custom pipelines

Analysis Structure

Analysis

  • make method populates the table, running core analysis.
  • MyAnalysis.populate() will run the analysis on all combinations of upstream primary keys.
  • By convention, results are stored as NWB files, with a reference to the AnalysisNwbfile table.
  • Postpone insert calls until the end of the make (see also 'atomicity').
@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> MyAnalysisSelection
    ---
    -> AnalysisNwbfile
    """

    def make(self, key):
        params = (MyParams & key).fetch1("params")
        ...
        inserted = dict(key, nwb_file_name=new_file)
        self.insert1(key)

Custom pipelines

Foreign Key References

How to reference upstream tables?

Primary vs Secondary and Foreign Key vs Field are orthogonal concepts.

  • A primary key (pk) is a unique identifier for that table.
  • A secondary key (sk) field is data associated with the primary key.
  • A foreign pk is likely another processing step for that data.
  • A foreign sk is likely other data associated with that primary key, like AnalysisNwbfile.

Custom pipelines

Foreign Key References

Foreign Primary Key

  • As sole primary key indicates a one-to-one relationship
  • Likely another processing step
definition = """
-> UpstreamTable
---
anything: varchar(32)
"""
  • As one of multiple primary keys indicates pairing of upstream data.
definition = """
-> UpstreamTable1
-> UpstreamTable2
"""

Custom pipelines

Foreign Key References

Foreign Secondary Key

As a secondary key indicates either ...

1. Linking to non-unique fields. Does not uniquely identify the row

definition = """
-> UpstreamTable1
---
-> AnalysisNwbfile
"""

2. An aliasing (or 'burying') of the primary key(s), useful for SQL limitations

  • Too many inherited pks crowds a table or prevents it from being declared
  • For convenience navigating these, see 'long distance restrictions'
definition = """
my_id: int
---
-> UpstreamTable1
-> UpstreamTable2
"""

Custom pipelines

Using Merge Tables

  • Merge tables signify that an analysis can be done in multiple ways.
  • When attaching downstream, you can either ...
    • Assume your data comes from one path, or 'source'.
    • Allow for multiple sources and let merge-specific functions handle it.
@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
    definition = """
    -> SomeMergeTable
    -> MyParams
    ---
    -> AnalysisNwbfile
    """

    def make(self, key):
        params = (MyParams & key).fetch1("params")
        parent = SomeMergeTable().merge_get_parent(key)
        ...
        self.insert1(key)
        self.MyAnalysisPart.insert1({**key, "result": 1})

See docs for more information on available methods. Existing tables also provide a good reference.


Overview

This session will cover ...

  • ✅ DataJoint Infrastructure
  • ✅ DataJoint Table Tiers and Declaration Syntax
  • ✅ Spyglass Table Types, including Merge Tables
  • ✅ Creating a Custom Pipeline
  • 👀 Common DataJoint Errors (time permitting)

Common Errors

  1. General how-to
  2. Integrity Error
  3. Permission Error
  4. TypeError
  5. KeyError
  6. DataJointError

Common Errors

How to: Debug mode

  • An error stack has multiple 'frames' with separately defined variables
  • To figure out what happened, you may need to 'hop into' a given frame

To enter into debug mode, you can either ...

  • Use IPython/Jupyter and run %debug upon hitting an error
  • Use the VSCode extension
  • Add a breakpoint to your code ...
__import__("pdb").set_trace()

In this mode, you can enter ...

  • u and d to move up and down the stack
  • l to list the code around the current line
  • q to quit the debugger
  • c to continue running the code
  • h for help, which will list all the commands

Integrity

IntegrityError: Cannot add or update a child row: a foreign key constraint fails
  (`schema`.`_table`, CONSTRAINT `_table_ibfk_1`
  FOREIGN KEY (`parent_field`)
  REFERENCES `other_schema`.`parent_name` (`parent_field`)
  ON DELETE RESTRICT ON UPDATE CASCADE)
  • During insert: some part of the key doesn't exist in the parent
  • The full_table_name after REFERENCES indicates the issue
  • For a SpyglassMixin table you can run find_insert_fail
    • If any are empty, you know you need to insert there first.
    • Caution: may not work if there are proj aliases
my_key = dict(value=key)  # whatever you're inserting
MyTable.insert1(my_key)  # error here
parents = MyTable().find_insert_fail(my_key)

Common Errors

Permission

('Insufficient privileges.',
  "INSERT command denied to user 'username'@'127.0.0.1' for table '_table_name'",
  'INSERT INTO `schema_name`.`table_name`(`field1`,`field2`) VALUES (%s,%s)')

This is a MySQL error that means that either ...

  • You don't have access to the command you're trying to run (e.g., INSERT)
  • You don't have access to this command on the schema you're trying to run it on

To see what permissions you have, you can run the following ...

dj.conn().query("SHOW GRANTS FOR CURRENT_USER();").fetchall()

Permissions can only be modified by a database administrator


Common Errors

Type

TypeError: example_function() got an unexpected keyword argument 'this_arg'
  • Results from calling a function (e.g., example_function(this_arg=5)) with mismatched data
  • Check the function's accepted arguments by running help(example_function).
TypeError: 'NoneType' object is not iterable
  • The called function can't figure out how to run for over this object (e.g., for item in variable: ...)
  • Check the type of the variable with type(variable)

Common Errors

KeyError

KeyError: 'field_name'
  • Trying to access a key in a dictionary that doesn't exist (e.g., mydict['bad_key'])
  • Check the keys of the dictionary by running mydict.keys()
  • For your own code, you can set a default with mydict.get('field', default).

Common Errors

DataJoint

DataJointError("Attempt to delete part table {part} before deleting from its master {master} first.")
  • The delete process found a part table entry referencing the attempted delete
  • Delete the master table entry first

Overview

This session will cover ...

  • ✅ DataJoint Infrastructure
  • ✅ DataJoint Table Tiers and Declaration Syntax
  • ✅ Spyglass Table Types, including Merge Tables
  • ✅ Creating a Custom Pipeline
  • ✅ Common DataJoint Errors (time permitting)

This is a presentation designed for use with lookatme:

pip install lookatme
lookatme FileName.md --live
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment