The classification of biological databases follows a hierarchical system similar to information organization in other fields:
- Primary Databases: Contain raw experimental data directly submitted by researchers
- Secondary Databases: Contain processed, analyzed, and annotated data derived from primary databases
- Tertiary Databases: Integrate and synthesize information from multiple primary and secondary sources
- Mixed Databases: Incorporate aspects of multiple classification levels
Database | Main Focus | Description |
---|---|---|
DDBJ | Nucleotide Sequences | DNA Data Bank of Japan, one of the three international nucleotide sequence databases |
EMBL | Nucleotide Sequences | European Molecular Biology Laboratory nucleotide sequence database |
GenBank | Nucleotide Sequences | NIH genetic sequence database, annotated collection of publicly available DNA sequences |
GEO | Gene Expression | Gene Expression Omnibus, repository for high-throughput gene expression and related data |
PDB | Protein Structures | Protein Data Bank, repository of 3D structural data of biological macromolecules |
PubMed Central | Literature | Repository of biomedical and life sciences journal literature |
RefSeq | Reference Sequences | NCBI Reference Sequence Database, collection of reference sequence standards |
TrEMBL (UniProt) | Protein Sequences | Automatically annotated protein sequence database (part of UniProt) |
Database | Main Focus | Description |
---|---|---|
BioGRID | Interaction Data | Biological General Repository for Interaction Datasets, contains curated protein, genetic, and chemical interactions |
COMPARTMENTS | Protein Localization | Database of protein subcellular localization evidence |
COG | Orthologous Groups | Clusters of Orthologous Groups, classification of proteins encoded in complete genomes |
DIP | Protein Interactions | Database of Interacting Proteins, experimentally determined protein-protein interactions |
DISEASES | Disease Associations | Integrates evidence on disease-gene associations from various sources |
EggNOG | Orthologous Groups | Evolutionary genealogy of genes: Non-supervised Orthologous Groups, hierarchical classification of proteins |
Gene Ontology | Functional Annotations | Controlled vocabulary of gene and gene product attributes across species |
HPRD | Protein Reference | Human Protein Reference Database, curated proteomic information for human proteins |
HUGO (HGNC) | Gene Nomenclature | HUGO Gene Nomenclature Committee, standardized nomenclature for human genes |
Intact | Molecular Interactions | Database of molecular interaction data |
InterPro | Protein Families | Integrates protein signatures from multiple databases to classify proteins |
MINT | Molecular Interactions | Molecular INTeraction database, experimentally verified protein-protein interactions |
OMIM | Human Disease Genes | Online Mendelian Inheritance in Man, catalog of human genes and genetic disorders |
Pfam | Protein Families | Database of protein families represented by multiple sequence alignments and HMMs |
proGenomes | Genome Classification | Database of prokaryotic genomes with taxonomic classification |
ProteomeHD | Co-regulation Data | Contains co-regulation data for human proteins |
SIMAP | Protein Similarity | Similarity Matrix of Proteins, precalculated similarity relationships between proteins |
SMART | Protein Domains | Simple Modular Architecture Research Tool, identification and annotation of protein domains |
SwissModel | Protein Structures | Database of annotated 3D protein structure homology models |
TISSUES | Protein Expression | Database of protein expression patterns in tissues |
Database | Main Focus | Description |
---|---|---|
BioCyc | Pathways/Genomes | Collection of organism-specific Pathway/Genome Databases integrating genomic and metabolic pathway data |
KEGG | Pathways and Systems | Kyoto Encyclopedia of Genes and Genomes, integrates genomic, chemical, and system functional information |
Reactome | Biological Pathways | Curated and peer-reviewed database of reactions, pathways and biological processes |
WikiPathways | Biological Pathways | Community-curated open pathway database |
Database | Main Focus | Description |
---|---|---|
Ensembl | Genome Browser/Annotation | Genome browser that produces and maintains automatic annotation on selected eukaryotic genomes |
FlyBase | Model Organism Database | Database of genetic, genomic, and functional data for Drosophila |
SGD | Model Organism Database | Saccharomyces Genome Database, comprehensive resource for yeast biology |
SwissProt (UniProt) | Protein Sequences | Manually annotated, high-quality protein sequence database (part of UniProt) |
UniProt | Protein Sequences/Function | Universal Protein Resource, comprehensive resource of protein sequence and function |
WormBase | Model Organism Database | Database for Caenorhabditis elegans and related nematodes |
This classification reflects the nature of the data stored in each database, how it's processed, and its relationship to other data sources. Many databases have evolved over time to incorporate aspects of multiple categories as biological data becomes increasingly interconnected.