Skip to content

Instantly share code, notes, and snippets.

@inutano
Last active March 25, 2016 07:04
Show Gist options
  • Save inutano/0b0d2670d3b996427833 to your computer and use it in GitHub Desktop.
Save inutano/0b0d2670d3b996427833 to your computer and use it in GitHub Desktop.

RDFizing FastQC/Quanto data

Thanks to skwsm-san!

Original FastQC data used for modeling is embedded here.

Model for modules with data matrix

<Quanto Record> a :SequenceStatisticsReport .

Basic Statistics

<Quanto Record>
  :filename "ERR055260.fastq";
  :fileType "Conventional base calls";
  :encoding "Sanger / Illumina 1.9";
  :totalSequences [
    a :SequenceReadContent;
    :hasUnit uo:CountUnit;
    rdf:value 33692804;
  ];
  :filteredSequences [
    a :SequenceReadContent;
    :hasUnit uo:CountUnit;
    rdf:value 0;
  ];
  :sequenceLength [
    a :SequenceReadLength;
    rdf:value 36;
  ];
  :percentGC [
    a :NucleotideBaseContent;
    :hasUnit uo:Percent;
    rdf:value 40;
  ] .

Per base sequence quality

<Quanto Record> :hasMatrix [
  a :PerBaseSequenceQuality;
  :hasRow [
    a :Row;
    a :ExactBaseStatistics;
    :rowIndex 0;
    :basePosition “1”;
    :meanBaseCallQuality [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
    :medianBaseCallQuality [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
    :baseCallQualityLowerQuartile [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
    :baseCallQualityUpperQuartile [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
    :baseCallQuality10thPercentile [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
    :baseCallQuality90thPercentile [
      a :PhredQualityScore;
      rdf:value 36.0;
    ];
  ] ;
] .

Per sequence quality scores

<Quanto Record> :hasMatrix [
  a :PerSequenceQualityScores;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :baseCallQuality [
      a :PhredQualityScore;
      rdf:value 2;
    ];
    :sequenceReadCount [
      a :SequenceReadContent;
      :hasUnit uo:CountUnit;
      rdf:value 50286.0;
    ];
  ];
] .

Per base sequence content

<Quanto Record> :hasMatrix [
  a :PerBaseSequenceContent;
  :hasRow [
    a :Row;
    a :ExactBaseStatistics;
    :rowIndex 0;
    :basePosition “1”;
    :percentGuanine [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 21.568317674005407;
    ];
    :percentAdenine [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 27.723905080740685;
    ];
    :percentThymine [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 28.783710017130065;
    ];
    :percentCytosine [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 21.924067228123846;
    ];
  ];
] .

Per base GC content

<Quanto Record> :hasMatrix [
  a :PerBaseGCContent;
  :hasRow [
    a :Row;
    a :ExactBaseStatistics;
    :rowIndex 0;
    :basePosition “1”;
    :percentGC [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 43.49238490212925;
    ];
  ];
] .

Per sequence GC content

<Quanto Record> :hasMatrix [
  a :PerSequenceGCContent;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :percentGC [
      a :NucleotideBaseContent;
      :hasUnit uo:Percent;
      rdf:value 0;
    ];
    :sequenceReadCount [
      a :SequenceReadContent;
      :hasUnit uo:CountUnit;
      rdf:value 2030.0;
    ];
  ];
] .

Per base N content

<Quanto Record> :hasMatrix [
  a :PerBaseNContent;
  :hasRow [
    a :Row;
    a :ExactBaseStatistics;
    :rowIndex 0;
    :basePosition “1”;
    :nCount [
      a :NContent;
      :hasUnit uo:CountUnit;
      rdf:value 0.0
    ];
  ];
] .

Sequence Length Distribution

<Quanto Record> :hasMatrix [
  a :SequenceLengthDistribution;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :sequenceReadLength [
      a :SequenceReadLength;
      :hasUnit uo:CountUnit;
      rdf:value 36;
    ];
    :sequenceReadCount [
      a :SequenceReadContent;
      :hasUnit uo:CountUnit;
      rdf:value 3.3692804E7;
    ];
  ];
] .

Sequence Duplication Levels

<Quanto Record> :hasMatrix [
  a :SequenceDuplicationLevels;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :sequenceDuplicationLevel [
      a :SequenceDuplicationLevel;
      :hasUnit uo:CountUnit;
      rdf:value 1;
    ];
    :sequenceReadRelativeCount [
      a :SequenceReadContent
      :hasUnit uo:CountUnit;
      rdf:value 100;
    ];
  ];
] .

Overrepresented sequences

<Quanto Record> :hasMatrix [
  a :OverrepresentedSequences;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :overrepresentedSequence "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCG";
    :sequenceReadCount [
      a :SequenceReadContent;
      :hasUnit uo:CountUnit;
      rdf:value 66145;
    ];
    :sequenceReadPercentage [
      a :SequenceReadContent;
      :hasUnit uo:Percentage;
      rdf:value 0.19631788437673514;
    ];
    :possibleSourceOfSequence "Illumina Paired End PCR Primer 2 (97% over 36bp)";
  ];
] .

Kmer Content

<Quanto Record> :hasMatrix [
  a :KmerContent;
  :hasRow [
    a :Row;
    :rowIndex 0;
    :kmerSequence "CTATG";
    :sequenceReadCount [
      a :SequenceReadContent;
      :hasUnit uo:CountUnit;
      rdf:value 3682525;
    ];
    :observedPerExpectedOverall [
      a :SequenceReadContent;
      :hasUnit uo:ratio;
      rdf:value 3.1166635;
    ];
    :observedPerExpectedMax [
      a :SequenceReadContent;
      :hasUnit uo:ratio;
      rdf:value 3.6598775;
    ];
    :observedPerExpectedMaxPosition "6";
  ];
] .

Modules added by Quanto

<Quanto Record>
  :minSequenceLength 36;
  :maxSequenceLength 36;
  :meanSequenceLength 36;
  :medianSequenceLength 36;
  :overallMeanBaseCallQuality [
    a :PhredQualityScore;
    rdf:value 40;
  ];
  :overallMedianBaseCallQuality [
    a :PhredQualityScore;
    rdf:value 40;
  ];
  :overallNContent [
    a :NContent;
    :hasUnit uo:Percentage
    rdf:value 0.1;
  ] .

Class

General Class for Read Statistics

  • SequenceStatisticsReport
  • SequenceStatisticsMatrix
  • FastQCStatisticsMatrix
  • Row
  • ExactBaseStatistics
  • BaseRangeStatistics

FastQC Module Class

has_parent :FastQCStatisticsMatrix

  • PerBaseSequenceQuality
  • PerTileSequenceQuality
  • PerSequnceQualityScores
  • PerBaseSequenceContent
  • PerSequenceGCContent
  • PerBaseNContent
  • SequenceLengthDistribution
  • SequenceDuplicationLevels
  • OverrepresentedSequences
  • KmerContent

Object for values in a matrix

  • PhredQualityScore
  • NucleotideBaseContent
  • SequenceReadContent
  • SequenceReadLength
  • SequenceDuplicationLevel

Predicates

  • object properties
    • hasMatrix
  • data properties
    • filename
    • fileType
    • encoding
    • totalSequences
    • filteredSequences
    • sequenceLength
    • percentGC

Predicates for SequenceStatisticsMatrix

  • hasRow

Predicates for a row in a SequenceStatisticsMatrix

  • object properties
    • baseCallQuality
    • baseCallQuality10thPercentile
    • baseCallQuality90thPercentile
    • baseCallQualityLowerQuartile
    • baseCallQualityUpperQuartile
    • kmerSequence
    • meanBaseCallQuality
    • medianBaseCallQuality
    • nCount
    • observedPerExpectedMax
    • observedPerExpectedMaxPosition
    • observedPerExpectedOverall
    • overrepresentedSequence
    • percentAdenine
    • percentCytosine
    • percentGuanine
    • percentThymine
    • possibleSourceOfSequence
    • sequenceDuplicationLevel
    • sequenceReadCount
    • sequenceReadLength
    • sequenceReadPercentage
    • sequenceReadRelativeCount
  • data properties
    • rowIndex
    • basePosition
    • percentGC

Predicates for values in a row of SequenceStatisticsMatrix

  • object properties
    • hasUnit

Predicates for Quanto modules

  • object properties
    • overallMeanBaseCallQuality
    • overallMedianBaseCallQuality
    • overallNContent
  • data properties
    • minSequenceLength
    • maxSequenceLength
    • meanSequenceLength
    • medianSequenceLength
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment