| title | author | date | styles | ||
|---|---|---|---|---|---|
Spyglass Workshop 2025 |
Chris Broz |
08/2025 |
|
Window must be this wide ─────────────────────────────────────────────────────►
And this tall
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
▼
For use with `lookatme`, a terminal-based presentation tool.
This session will cover ...
- ⭕ DataJoint Infrastructure
- ⭕ DataJoint Table Tiers and Declaration Syntax
- ⭕ Spyglass Table Types, including Merge Tables
- ⭕ Creating a Custom Pipeline
- ⭕ Common DataJoint Errors (time permitting)
This session will cover ...
- 👀 DataJoint Infrastructure
- ⭕ DataJoint Table Tiers and Declaration Syntax
- ⭕ Spyglass Table Types, including Merge Tables
- ⭕ Creating a Custom Pipeline
- ⭕ Common DataJoint Errors (time permitting)
What are we doing? What are we connecting to?
┌────────────────┐
│ OS │
│┌──────────────┐│
││ Python ││
││┌────────────┐││
│││ Jupyter │││
│││┌──────────┐│││
││││Some Code ││││
│││└──────────┘│││
││└────────────┘││
│└──────────────┘│
└────────────────┘
What are we doing? What are we connecting to?
┌────────────────┐
│ OS │
│┌──────────────┐│
││ Python ││
││┌────────────┐││
│││ Jupyter │││
│││┌──────────┐│││ ┌────────┐
││││DataJoint─┼┼┼┼──┼─> MySQL│
│││└──────────┘│││ └────────┘
││└────────────┘││
│└──────────────┘│
└────────────────┘
┌──────────┐ ┌────────┐
│DataJoint─┼──┼─> MySQL│
└──────────┘ └────────┘
When importing a table, DataJoint...
- Connects to the SQL server using credentials
- Checks to see if that table exists
- If not, declares it
A given experiment might have separate spreadsheets for ...
- Subjects
- Sessions
- Analysis parameters
- Analysis
With the experimenter remembering the relationships between them.
Generate tables with relationships to one another
┌───────────────┐
│SUBJECT │
│*subj_id*, name│
└───────┬───────┘
│
▼
┌─────────────────────────────┐ ┌───────────────────────────┐
│SESSION │ │ PARAMETERS │
│*subj_id*, *session_id*, time│ │ *param_id*, param1, param2│
└─────────────┬───────────────┘ └────────────┬──────────────┘
│ │
▼ ▼
┌───────────────────────────────────┐
│ANALYSIS │
│*subj_id*, *session_id*, *param_id*│
└───────────────────────────────────┘
Terminology:
- Downstream: direction of data flow (vs. Upstream)
- Primary keys: unique identifiers for each row (here, marked by
*) - Secondary keys: fields for each row, but may repeat values.
- Foreign keys: references to primary keys in other tables (here,
subj_id)
┌───────────────┐
│SUBJECT │
│*subj_id*, name│
└───────┬───────┘
▼
┌─────────────────────────────┐
│SESSION │
│*subj_id*, *session_id*, time│
└─────────────────────────────┘
Terminology:
- Downstream: direction of data flow (vs. Upstream)
- Primary keys: unique identifiers for each row (here, marked by
*) - Secondary keys: fields for each row, but may repeat values.
- Foreign keys: references to primary keys in other tables (here,
subj_id)
Database design is the process of mapping our conceptual model of the data to a set of tables, relationships, fields, and contents.
- SQL is a programming language for relational databases.
- At its core, DataJoint is just a mapping between Python and SQL.
- Python commands -> SQL commands
- Python functions -> prescribed processing steps
- DataJoint ...
- is opinionated about how to structure the databases
- limits the full power of SQL in way that promotes good practices.
- Python stores ...
- A copy of table definitions, that may be out of sync with the database.
- Methods for processing data, that may be out of sync with existing data.
Good data provenance requires good version control and documentation to keep these in sync.
This session will cover ...
- ✅ DataJoint Infrastructure
- 👀 DataJoint Table Tiers and Declaration Syntax
- 👀 Spyglass Table Types, including Merge Tables
- ⭕ Creating a Custom Pipeline
- ⭕ Common DataJoint Errors (time permitting)
A DataJoint py script will have...
- Front matter
- Imports
- Schema declaration
- Tables
- Class inheritance
- Table types
- Definitions
- Methods
A DataJoint py script will have...
- Front matter
- Imports - pulling code from other files
- Schema declaration - telling DataJoint/SQL where the tables will live
- Tables
- Class inheritance
- Table types
- Definitions
- Methods
At the beginning of the schema file, you'll find ...
- Imports
- Package imports - full package
- Individual imports - part of a package
- Aliased imports - full or part, renamed
- Relative imports - full or part, from adjacent file
- Schema declaration
import os # package import
from typing import Union # Individual class import
from itertools import chain as iter_chain # Aliased function import
import datajoint as dj # Aliased package import
from spyglass.common import Nwbfile # noqa: F401 # Individual package import, quiet linter
from ..utils.schema_prefix import SCHEMA_PREFIX # Relative import
schema = dj.schema(SCHEMA_PREFIX + "_example") # Schema declarationschema = dj.schema("username_example") # Schema declaration- Database permissions are usually prefix-based
- Spyglass adds to 'shared' prefixes (e.g.,
common), which it relies on for advanced features - Adding to shared prefixes can cause problems.
- Instead, use your username or ask database admin about a project prefix
spyglass.common.VideoFile.full_table_name == "`common_behav`.`_video_file`"A DataJoint py script will have...
- Front matter
- Imports
- Schema declaration
- Tables
- Class inheritance - expanding classes with imported code
- Table types - general categories of the table
- Definitions - column structure of the table
- Methods - additional tools for this specific table
The parentheses in the class definition indicate what the class inherits from.
@schema
class ExampleTable(SpyglassMixin, dj.Manual):
passThis table is ...
- A DataJoint
Manualtable, which is a table that is manually populated. - A
SpyglassMixintable, which provides some Spyglass-specific features.
- DataJoint types:
Manualtables are manually populated.Lookuptables can be populated on declaration, and rarely change.Computedtables are populated by a method runs computations on upstream entries.Importedtables are populated by a method that imports data from another source.Parttables are used to store data that is conceptually part of another table.
- Spyglass conceptual types:
- Data tables are the starting point for an analysis. Either ingested or analysis output.
- Parameter tables (often
dj.Lookup) store parameters for analysis. - Selection tables (
dj.Manual) store pairings of parameters and data to be analyzed. - Analysis tables (often
dj.Computed) store the results of analysis. - Merge tables combine data from multiple pipeline versions.
- DataJoint types:
Manual,Lookup,Computed/Imported,Part - Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge
| Spyglass Type | DataJoint Type |
|---|---|
| Data | Manual, Imported, Computed |
| Parameter | Lookup, can be Manual |
| Selection | Manual |
| Analysis | Computed |
| Merge | 'Merge', with Parts |
- Any table can have parts for 1-to-many relationships
- Merge tables have a special class with 1 part per merged pipeline
- DataJoint types:
Manual,Lookup,Computed/Imported,Part - Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge
┌────┐
│Data│
└─┬──┘
┌────────────┼───┐
│Schema │ │
│ │ │
│┌─────────┐ │ │
││Parameter│ │ │
│└────┬────┘ │ │
│ ▼ ▼ │
│ ┌─────────┐ │
│ │Selection│ │
│ └────┬────┘ │
│ ▼ │
│ ┌─────────┐ │
│ │Analysis │ │
│ └─────────┘ │
└────────────────┘
- DataJoint types:
Manual,Lookup,Computed/Imported,Part - Spyglass conceptual types: Data, Parameter, Selection, Data/Analysis, Merge
┌────┐
│Data│
└─┬──┘ ┌─────────────┐
┌────────────┼───┐ │ ┌──────────┼─────┐
│Schema │ │ │ │Merge │ │
│ │ │ │ │Schema │ │
│┌─────────┐ │ │ │ │ │ │
││Parameter│ │ │ │ │┌─────┐ │ │
│└────┬────┘ │ │ │ ││Merge│ │ │
│ ▼ ▼ │ │ │└───┬─┘ │ │
│ ┌─────────┐ │ │ │ ▼ ▼ │
│ │Selection│ │ │ │ ┌──────────┐ │
│ └────┬────┘ │ │ │ │Merge Part│ │
│ ▼ │ │ │ └──────────┘ │
│ ┌─────────┐ │ │ │ │
│ │Analysis │ │ │ │ │
│ └────┬────┘ │ │ │ │
└─────────┼──────┘ │ └────────────────┘
└─────────┘
NOTE: for diagrams of existing tables, see dj.Diagram
Each table can have a docstring that describes the table, and must have a
definition attribute that contains the SQL-like table definition.
#comments are used to describe the table and its columns.---separates the primary key columns from the data columns.field : datatypedefines a column using a SQL datatype->indicates a foreign key reference to another table.
@schema
class ExampleTable(SpyglassMixin, dj.Manual):
"""Table Description""" # also called a doc string
definition = """ # Table comment used as header
primary_key1 : uuid # randomized string
primary_key2 : int # integer
---
secondary_key1 : varchar(32) # string of max length 32
secondary_key2 : blob # anything, usually a python dictionary
-> Nwbfile # Foreign key reference, inherit primary key of this table
-> Subject.proj(actual_id='subject_id') # Rename column
"""- Note: Fk-ref'd tables must be in script environment
Many Spyglass tables have methods that provide functionality for the pipeline.
- Properties don't have arguments, can be called as
Table.property - Static methods don't need any info from the table
- Class methods
- Basic Methods
@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
...
@property # Static information, Table.property
def pk(self):
return "subject_id"
@staticmethod # Basic func with no reference to self instance
def _subj_dict(id: UUID):
"""Return the subject dict"""
return {"subject_id": id}- Properties don't have arguments, can be called as
Table.property - Static methods don't need any info from the table
- Class methods don't need instance information
- Basic Methods assume access to class instance
@schema
class SubjBlinded(SpyglassMixin, dj.Manual):
...
@classmethod # Class, not instance. Table.func(), not Table().func()
def example(cls, argument=None): # Default value
pass # Not doing anything
def blind_subjs(self, restriction: Union[str, dict]):
"""Import all subjects selected by the restriction"""
insert_keys = [
{
**self._subj_dict(dj.hash.key_hash(key)),
"actual_id": key["subject_id"],
}
for key in (Subject & restriction).fetch("KEY")
]
self.insert(insert_keys, skip_duplicates=True)@schema
class MyParams(SpyglassMixin, dj.Lookup):
definition = """
param_name: varchar(32)
---
params: blob
"""
contents = [ # Contents inserted on declaration
["example1", {"A": 1, "B": 2}],
["example2", {"A": 3, "B": 4}],
]
@classmethod
def insert_default(cls):
cls.insert(cls.contents, skip_duplicates=True)@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
definition = """
-> SubjBlinded
-> MyParams
"""
@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
definition = """
-> MyAnalysisSelection
"""
class MyAnalysisPart(SpyglassMixin, dj.Part):
definition = """
-> MyAnalysis
---
result: int
"""
def make(self, key):
...
self.insert1(key)
self.MyAnalysisPart.insert1({**key, "result": 1})Run the make method to populate the table with MyAnalysis().populate().
This session will cover ...
- ✅ DataJoint Infrastructure
- ✅ DataJoint Table Tiers and Declaration Syntax
- ✅ Spyglass Table Types, including Merge Tables
- 👀 Creating a Custom Pipeline
- ⭕ Common DataJoint Errors (time permitting)
Designing your own involves ...
- Identifying pieces of your analysis ...
- Parameters
- Data
- Selection (optional)
- Analysis
- Deciding foreign key references
- Using Merge Tables
- Key-value pairs that define the analysis.
- Choice of explicit fields or
blobfor arbitrary data.- Fields are more readable and searchable.
blobis more flexible and easier to add new parameters.
@schema
class MyParams(SpyglassMixin, dj.Lookup):
definition = """
param_name: varchar(32)
---
params: blob
"""
contents = [ # Contents inserted on declaration
["example1", {"A": 1, "B": 2}],
["example2", {"A": 3, "B": 4}],
]- Input data might ...
- Include minor preprocessing within your pipeline (e.g., blinding)
- Be raw (e.g.,
common.Raw) or processed by another pipeline (e.g.,lfp.LFPOutput), and referenced by aSelectiontable.
- Data tables might be structured differently depending on the unit of analysis.
- One-to-one with existing tables: link directly to
Selectiontable. - One-to-many or many-to-one: use
Parttables.
- One-to-one with existing tables: link directly to
@schema
class SubjBlinded(SpyglassMixin, dj.Manual): # Preprocessed data
"""Blinded subject table."""
definition = """
subject_id: uuid # id
---
-> Subject.proj(actual_id='subject_id')
"""
def insert(self, data):
"""Insert data with blinded subject IDs."""
...- Input data might ...
- Include minor preprocessing within your pipeline (e.g., blinding)
- Be raw (e.g.,
common.Raw) or processed by another pipeline (e.g.,lfp.LFPOutput), and referenced by aSelectiontable.
- Data tables might be structured differently depending on the unit of analysis.
- One-to-one with existing tables: link directly to upstream table.
- Many-to-one: use
Parttables.
@schema
class MyGrouping(SpyglassMixin, dj.Manual): # many-to-one
definition = """
group_id: int
"""
class MyUnit(SpyglassMixinPart):
definition = """
-> MyGrouping
-> UpstreamUnit
"""makewill pair all combinations of upstream primary keys.- A
Selectiontable can be used to filter the upstream data, as a staging table for analysis.
@schema
class MyAnalysisSelection(SpyglassMixin, dj.Manual):
definition = """
-> SubjBlinded
-> MyParams
"""makemethod populates the table, running core analysis.MyAnalysis.populate()will run the analysis on all combinations of upstream primary keys.- By convention, results are stored as NWB files, with a reference to the
AnalysisNwbfiletable. - Postpone
insertcalls until the end of themake(see also 'atomicity').
@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
definition = """
-> MyAnalysisSelection
---
-> AnalysisNwbfile
"""
def make(self, key):
params = (MyParams & key).fetch1("params")
...
inserted = dict(key, nwb_file_name=new_file)
self.insert1(key)How to reference upstream tables?
Primary vs Secondary and Foreign Key vs Field are orthogonal concepts.
- A primary key (pk) is a unique identifier for that table.
- A secondary key (sk) field is data associated with the primary key.
- A foreign pk is likely another processing step for that data.
- A foreign sk is likely other data associated with that primary key,
like
AnalysisNwbfile.
- As sole primary key indicates a one-to-one relationship
- Likely another processing step
definition = """
-> UpstreamTable
---
anything: varchar(32)
"""- As one of multiple primary keys indicates pairing of upstream data.
definition = """
-> UpstreamTable1
-> UpstreamTable2
"""As a secondary key indicates either ...
1. Linking to non-unique fields. Does not uniquely identify the row
definition = """
-> UpstreamTable1
---
-> AnalysisNwbfile
"""2. An aliasing (or 'burying') of the primary key(s), useful for SQL limitations
- Too many inherited pks crowds a table or prevents it from being declared
- For convenience navigating these, see 'long distance restrictions'
definition = """
my_id: int
---
-> UpstreamTable1
-> UpstreamTable2
"""- Merge tables signify that an analysis can be done in multiple ways.
- When attaching downstream, you can either ...
- Assume your data comes from one path, or 'source'.
- Allow for multiple sources and let merge-specific functions handle it.
@schema
class MyAnalysis(SpyglassMixin, dj.Computed):
definition = """
-> SomeMergeTable
-> MyParams
---
-> AnalysisNwbfile
"""
def make(self, key):
params = (MyParams & key).fetch1("params")
parent = SomeMergeTable().merge_get_parent(key)
...
self.insert1(key)
self.MyAnalysisPart.insert1({**key, "result": 1})See docs for more information on available methods. Existing tables also provide a good reference.
This session will cover ...
- ✅ DataJoint Infrastructure
- ✅ DataJoint Table Tiers and Declaration Syntax
- ✅ Spyglass Table Types, including Merge Tables
- ✅ Creating a Custom Pipeline
- 👀 Common DataJoint Errors (time permitting)
- General how-to
- Integrity Error
- Permission Error
- TypeError
- KeyError
- DataJointError
- An error stack has multiple 'frames' with separately defined variables
- To figure out what happened, you may need to 'hop into' a given frame
To enter into debug mode, you can either ...
- Use IPython/Jupyter and run
%debugupon hitting an error - Use the VSCode extension
- Add a breakpoint to your code ...
__import__("pdb").set_trace()In this mode, you can enter ...
uanddto move up and down the stacklto list the code around the current lineqto quit the debuggercto continue running the codehfor help, which will list all the commands
IntegrityError: Cannot add or update a child row: a foreign key constraint fails
(`schema`.`_table`, CONSTRAINT `_table_ibfk_1`
FOREIGN KEY (`parent_field`)
REFERENCES `other_schema`.`parent_name` (`parent_field`)
ON DELETE RESTRICT ON UPDATE CASCADE)- During insert: some part of the key doesn't exist in the parent
- The
full_table_nameafterREFERENCESindicates the issue
- For a
SpyglassMixintable you can runfind_insert_fail- If any are empty, you know you need to insert there first.
- Caution: may not work if there are
projaliases
my_key = dict(value=key) # whatever you're inserting
MyTable.insert1(my_key) # error here
parents = MyTable().find_insert_fail(my_key)('Insufficient privileges.',
"INSERT command denied to user 'username'@'127.0.0.1' for table '_table_name'",
'INSERT INTO `schema_name`.`table_name`(`field1`,`field2`) VALUES (%s,%s)')This is a MySQL error that means that either ...
- You don't have access to the command you're trying to run (e.g.,
INSERT) - You don't have access to this command on the schema you're trying to run it on
To see what permissions you have, you can run the following ...
dj.conn().query("SHOW GRANTS FOR CURRENT_USER();").fetchall()Permissions can only be modified by a database administrator
TypeError: example_function() got an unexpected keyword argument 'this_arg'- Results from calling a function (e.g.,
example_function(this_arg=5)) with mismatched data - Check the function's accepted arguments by running
help(example_function).
TypeError: 'NoneType' object is not iterable- The called function can't figure out how to run
forover this object (e.g.,for item in variable: ...) - Check the type of the variable with
type(variable)
KeyError: 'field_name'- Trying to access a key in a dictionary that doesn't exist (e.g.,
mydict['bad_key']) - Check the keys of the dictionary by running
mydict.keys() - For your own code, you can set a default with
mydict.get('field', default).
DataJointError("Attempt to delete part table {part} before deleting from its master {master} first.")- The delete process found a part table entry referencing the attempted delete
- Delete the master table entry first
This session will cover ...
- ✅ DataJoint Infrastructure
- ✅ DataJoint Table Tiers and Declaration Syntax
- ✅ Spyglass Table Types, including Merge Tables
- ✅ Creating a Custom Pipeline
- ✅ Common DataJoint Errors (time permitting)
This is a presentation designed for use with lookatme:
pip install lookatme
lookatme FileName.md --live