The attribute sets for participant
were very consistent across projects. Only 1 project had two extra fields.
- Added
Diagnosis
andObservation
to differentiate phenotype links
- Should the
Family
entity should be added to the gen3 graph? - Is the Affected/Unaffected edges to Phenotype synonymous with gen3's
Diagnosis
? - What is the best way to model the present/absent edges to Phenotype? Perhaps an
Observation
type? - What is the best way to lable the edge between participant and
Gene
(expressed
is a placeholder)? - Interestingly, the participant record in terra has a variant like set of fields [pos,ref,alt,hgvs,...] none of them have content. What is the intent of these fields?
- Temporal data:
- Do we have dob, age_at_diagnosis or age_at_enrollment to base timeseries data (PMI)
- Ontologies: Phenotype seems to be fairly uniform (HPO)
- ontology term on edge type: Is there a standard way to represent [Affected, Unaffected, present, absent]
same-as
: are subjects and samples shared between projects?
- Added
Sample
node - Added
CramFile CraiFile
nodes
- Should we move the bulk of these attributes to the CramFile node?
- Should we reprocess the CRAM files to create an agreed upon set of attributes?