Skip to content

Instantly share code, notes, and snippets.

@bwalsh
Last active August 7, 2019 17:50
Show Gist options
  • Save bwalsh/57e6225ef6018e221fd0a566c2d1d753 to your computer and use it in GitHub Desktop.
Save bwalsh/57e6225ef6018e221fd0a566c2d1d753 to your computer and use it in GitHub Desktop.
AnVIL CMG Participants & Samples

Participants

The participants were derived from 28 Terra projects assigned to AnVIL.

image

The attribute sets for participant were very consistent across projects. Only 1 project had two extra fields.

image

image

Mapping

Our intial mapping to a GDC-like graph is as follows:

  • Added Diagnosis and Observation to differentiate phenotype links

image

Discussion points

  • Should the Family entity should be added to the gen3 graph?
  • Is the Affected/Unaffected edges to Phenotype synonymous with gen3's Diagnosis?
  • What is the best way to model the present/absent edges to Phenotype? Perhaps an Observation type?
  • What is the best way to lable the edge between participant and Gene (expressed is a placeholder)?
  • Interestingly, the participant record in terra has a variant like set of fields [pos,ref,alt,hgvs,...] none of them have content. What is the intent of these fields?
  • Temporal data:
  • Do we have dob, age_at_diagnosis or age_at_enrollment to base timeseries data (PMI)
  • Ontologies: Phenotype seems to be fairly uniform (HPO)
  • ontology term on edge type: Is there a standard way to represent [Affected, Unaffected, present, absent]
  • same-as : are subjects and samples shared between projects?

Samples

The properties associated with sample diverged widely, consisting mainly of CRAM summary statistics

image

image

Mapping

  • Added Sample node
  • Added CramFile CraiFile nodes

image

Discussion points

  • Should we move the bulk of these attributes to the CramFile node?
  • Should we reprocess the CRAM files to create an agreed upon set of attributes?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment