This directory contains:
- ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) dataset reports, and ClinVar development documents
- documents related to the NCBI collaboration with ClinGen (http://www.clinicalgenome.org/)
- ftp://ftp.ncbi.nih.gov/pub/clinvar/ClinGen/ExpertPanelRequestForm.docx - how to apply for expert panel status
- data common to ClinVar and GTR
- ftp://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms - terminology used by both GTR and ClinVar.
Go to: ClinVar Home - [Submit Data to ClinVar] (http://www.ncbi.nlm.nih.gov/clinvar/docs/submit/) - Genetic Testing Registry Home
--
You may submit data to ClinVar using Excel spreadsheets or XML files.
- Excel Submission Templates
- ftp://ftp.ncbi.nih.gov/pub/clinvar/submission_templates/SubmissionTemplate.xlsx - standard submission template
- ftp://ftp.ncbi.nih.gov/pub/clinvar/submission_templates/SubmissionTemplateLite.xlsx - for submissions with less supporting evidence
- ftp://ftp.ncbi.nih.gov/pub/clinvar/submission_templates/SubmissionTemplate_version3.xlsx - beta version of updated standard template (please use standard template if you are time-constrained)
- XML Submission Schema Files
- ftp://ftp.ncbi.nih.gov/pub/clinvar/clinvar_submission.xsd - current XML submission document schema
- ftp://ftp.ncbi.nih.gov/pub/clinvar/xsd_submission/ - folder of previous schema versions
- Please direct XML data submission questions to [email protected].
URL: ftp://ftp.ncbi.nih.gov/pub/clinvar/disease_names Format: tab-separated values Updated: daily
Reports names and identifiers used in GTR and ClinVar. When a name is used by more than one source,
there may be more than one line per condition. Unlike the gene_condition_source_id
file, it is comprehensive,
and does not require knowledge of any gene-to-disease relationship.
Columns:
Col | Name | Description |
---|---|---|
1 | DiseaseName | The name preferred by GTR and ClinVar |
2 | SourceName | Sources that also use this preferred name |
3 | ConceptID | The identifier assigned to a disorder associated with this gene (1) |
4 | SourceID | Identifier used by the source reported in column 2 |
5 | DiseaseMIM | MIM number for the condition |
6 | LastUpdated | Last time this record was modified by NCBI staff |
Notes:
(1) If the value starts with a C and is followed by digits, the ConceptID is a value from UMLS; if a value begins with CN, it was created by NCBI-based processing.
URL: ftp://ftp.ncbi.nih.gov/pub/clinvar/gene_condition_source_id Format: tab-separated values Updated: daily
Reports gene-disease relationships used in ClinVar, Gene, GTR and MedGen. The sources of information for the gene-disease relationships include OMIM, GeneReviews, and a limited amount of curation by NCBI staff. The scope of disorders reported in this file is a subset of the disease_names
file because a gene-to-disease relationship is required.
Columns:
Col | Name | Description |
---|---|---|
1 | GeneID | the NCBI GeneID |
2 | GeneSymbol | the preferred symbol corresponding to the GeneID |
3 | ConceptID | The identifier assigned to a disorder associated with this gene (1) |
4 | SourceName | Sources that use this name |
5 | SourceID | The identifier used by this source |
6 | DiseaseMIM | MIM number for the condition |
7 | LastUpdated | Last time this record was modified by NCBI staff |
Notes:
(1) If the value starts with a C and is followed by digits, the ConceptID is a value from UMLS; if a value begins with CN, it was created by NCBI-based processing DiseaseName full name for the condition
URL: ftp://ftp.ncbi.nih.gov/pub/clinvar/ConceptID_history.txt Format: tab-separated values Updated: daily
Tracks changes in identifiers assigned to phenotypes over time. The ConceptID values in the first column are no longer active, and are either discontinued (the value in column 2 is 'No longer reported'), or replaced by a record with a different identifier. A replacement may be either the result of a merge (one record becoming secondary to another) or because of a change in numbering, usually because an identifier assigned by NCBI (starting with CN) is now thought to be represented by a ConceptID from UMLS (starting with C followed by numerals).
Columns:
Col | Name | Description |
---|---|---|
1 | Previous ConceptID | the outdated identifier |
2 | Current ConceptID | the current identifier |
3 | Date of Action | the date this change occurred |
Subdirectory | Description (notes) |
---|---|
presentations | slides or other documents about ClinVar |
submission_templates | templates for submission by spreadsheet |
tab_delimited | summary data of several types |
vcf_GRCh37 | vcf files generated by dbSNP based on GRCh37/hg19 (2) |
vcf_GRCh38 | vcf files generated by dbSNP based on GRCh38/hg38 (1,2) |
xml | An extraction of data in ClinVar as xml (3) |
xsd_public | current and previous versions of XSD schema files for XML data |
Notes:
(1) For more about the conventions used to process and report the vcf data, see also: http://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/
(2) Please note that until the new data from 1000 Genomes are processed, there will be no files in GRCh38 coordinates that report common variants (common_all.vcf.gz) or common variants not known to contribute to phenotype (common_no_known_medical_impact-latest.vcf). These are available only in the vcf_GRCh37 subdirectory. _Note: This notice should be in the README for GRCh38!
(3) The schema for the files in the xml directory is ftp://ftp.ncbi.nih.gov/pub/clinvar/xsd_public/clinvar_public.xsd