Skip to content

Instantly share code, notes, and snippets.

@mpkocher
Last active October 2, 2019 14:59
Show Gist options
  • Save mpkocher/9de553f50d4b376a30959fb2c33ac9fe to your computer and use it in GitHub Desktop.
Save mpkocher/9de553f50d4b376a30959fb2c33ac9fe to your computer and use it in GitHub Desktop.
High Level Overview of the SMRT Link Secondary Analysis System
digraph {
subgraph cluster_0 {
c0_Job [shape=hexagon, color=blue, label="Import DataSet Job"]
c0_store [shape=cylinder, label="DataStore"]
c0_ep1 [shape=diamond, label="Path /path/to/alpha_subreadset.xml"]
c0_ep1 -> c0_Job
c0_Job -> c0_store
c0_dsf_02 [shape=tab, label="DataStoreFile (SubreadSet)"]
c0_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c0_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c0_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c0_store -> c0_dsf_02
c0_store -> c0_dsf_04
c0_store -> c0_dsf_05
c0_store -> c0_dsf_03
c0_dsf_02 -> c0_dsf_04 [style=dotted]
c0_dsf_02 -> c0_dsf_05 [style=dotted]
}
subgraph cluster_2 {
c2_Job [shape=hexagon, color=blue, label="Import DataSet Job"]
c2_store [shape=cylinder, label="DataStore"]
c2_ep1 [shape=diamond, label="Path /path/to/beta_subreadset.xml"]
c2_ep1 -> c2_Job
c2_Job -> c2_store
c2_dsf_02 [shape=tab, label="DataStoreFile (SubreadSet)"]
c2_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c2_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c2_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c2_store -> c2_dsf_02
c2_store -> c2_dsf_04
c2_store -> c2_dsf_05
c2_store -> c2_dsf_03
c2_dsf_02 -> c2_dsf_04 [style=dotted]
c2_dsf_02 -> c2_dsf_05 [style=dotted]
}
subgraph cluster_3 {
c3_Job [shape=hexagon, color=blue, label="Merge DataSets Job"]
c3_store [shape=cylinder, label="DataStore"]
c3_ep1 [shape=diamond, label="Path /path/to/gamma_subreadset.xml"]
c3_ep1 -> c3_Job
c3_Job -> c3_store
c3_dsf_02 [shape=tab, label="DataStoreFile (Merged SubreadSet)"]
c3_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c3_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c3_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c3_store -> c3_dsf_02
c3_store -> c3_dsf_04
c3_store -> c3_dsf_05
c3_store -> c3_dsf_03
c3_dsf_02 -> c3_dsf_04 [style=dotted]
c3_dsf_02 -> c3_dsf_05 [style=dotted]
}
subgraph cluster_4 {
c4_Job [shape=hexagon, color=blue, label="Import DataSet Job"]
c4_store [shape=cylinder, label="DataStore"]
c4_ep1 [shape=diamond, label="Path /path/to/referenceset.xml"]
c4_ep1 -> c4_Job
c4_Job -> c4_store
c4_dsf_02 [shape=tab, label="DataStoreFile (ReferenceSet)"]
c4_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c4_store -> c4_dsf_02
c4_store -> c4_dsf_03
}
subgraph cluster_5 {
c5_Job [shape=hexagon, color=blue, label="Copy (and filter) DataSet"]
c5_store [shape=cylinder, label="DataStore"]
c5_ep1 [shape=diamond, label="DataSet UUID=X,filter=rq >= 0.7"]
c5_ep1 -> c5_Job
c5_Job -> c5_store
c5_dsf_02 [shape=tab, label="DataStoreFile (SubreadSet)"]
c5_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c5_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c5_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c5_store -> c5_dsf_02
c5_store -> c5_dsf_04
c5_store -> c5_dsf_05
c5_store -> c5_dsf_03
c5_dsf_02 -> c5_dsf_04 [style=dotted]
c5_dsf_02 -> c5_dsf_05 [style=dotted]
}
subgraph cluster_6 {
c6_Job [shape=hexagon, color=blue, label="Export DataSet(s) Zip Job"]
c6_store [shape=cylinder, label="DataStore"]
c6_ep1 [shape=diamond, label="DataSet UUIDs=X,Y,Z"]
c6_ep1 -> c6_Job
c6_Job -> c6_store
c6_dsf_02 [shape=tab, label="DataStoreFile DataSet XML(s) Zip"]
c6_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c6_store -> c6_dsf_02
c6_store -> c6_dsf_03
}
subgraph cluster_01 {
c1_Job [shape=hexagon, color=blue, label="Analysis Job"]
c1_store [shape=cylinder, label="DataStore"]
c1_ep1 [shape=diamond, label="EntryPoint (SubreadSet)"]
c1_ep2 [shape=diamond, label="EntryPoint (ReferenceSet)"]
c1_ep1 -> c1_Job
c1_ep2 -> c1_Job
c1_Job -> c1_store
c1_dsf_01 [shape=tab, label="DataStoreFile (Fasta)"]
c1_dsf_02 [shape=tab, label="DataStoreFile (AlignmentSet)"]
c1_dsf_03 [shape=tab, label="DataStoreFile (VCF)"]
c1_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c1_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c1_dsf_06 [shape=tab, label="DataStoreFile (LOG)"]
c1_store -> c1_dsf_01
c1_store -> c1_dsf_02
c1_store -> c1_dsf_03
c1_store -> c1_dsf_04
c1_store -> c1_dsf_05
c1_store -> c1_dsf_06
c1_ep1 -> c1_dsf_04 [style=dotted]
c1_dsf_02 -> c1_dsf_05 [style=dotted]
c1_ep2 -> c1_dsf_05 [style=dotted]
}
c3_dsf_02 -> c1_ep1
c4_dsf_02 -> c1_ep2
c0_dsf_02 -> c5_ep1
c2_dsf_02 -> c3_ep1
c5_dsf_02 -> c3_ep1
c5_dsf_02 -> c6_ep1
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Arch Overview

Core Nouns of the PacBio System

  1. Run (often created/edited from SMRT Link RunDesign, stored as XML)
  2. CollectionMetadata a Run has a list of Collection (Primary Analysis will convert a CollectionMetadata to a SubreadSet)
  3. PacBio DataSets SubreadSet, ReferenceSet, etc... These are thin-ish XML files that have general metadata as well as pointers to 'external resources' (e.g., BAM, Fasta files) and their companion index files.
  4. SMRT Link Job A general (async) unit of work to perform operations on PacBio DataSets
  5. ** DataStoreFile** a container for output files from a SMRT Link Job and contains metadata, such as file type, size, path. A list of DataStore Files is called a DataStore. This is the core output of SMRT Link Job.
  6. ** Report** a Report is general model to capture Report metrics (also referred to as 'Attributes'), Report Tables and Report Plot Groups. A Report is a specific type of DataStoreFile and are used to communicate details of a SMRT Link Job to the SMRT Link UI (and webservices.)

Second tier models, such as Report View Rules, or Pipeline View Rules are not discussed here.

General Workflow starting from PA

ICS/PA takes a Run XML with a list of Collections, converts each CollectionMeta into a SubreadSet. The SubreadSet is copied from ICS/PA file system into the customer storage on NFS (accessible by the companion SMRT Link instance) and the SubreadSet XML is imported into SMRT Link using the import-dataset Job type in SMRT Link. The Reports for the SubreadSet XML emitted from the import-dataset job show up in RunQC as well as in DataManagement in SMRT Link.

Show below is a sketch of the dataflow.

ICS and Primary Analysis DataFlow to Generate a SubreadSets for a given Run

General SMRT Link Job Model

Simplify, the general interface of a SMRT Link Job, for type T,

A Job takes T as input and produces a PB (T -> Job -> DataStore)

List of EntryPoint PB DataSet -> Job -> DataStore

A DataStore is a list of DataStore files.

Each DataStoreFile can be a different file types, such as, PB DataSet, VCF, ReportJSON, Fasta, etc... and also contains the specific ob id and UUID that generated the DataStoreFile.

During and after SMRT Link Job execution, the DataStoreFiles will be imorted into the db, the DataStoreFile. For a specific subset of file types (PB DataSet types), additional metadata will be stored in the SMRT Link database. Each DataSet has metadata about the specific dataset type as well as metadata about a possible 'parent' DataSet. The DataSet 'parentage' can be a result from copying, merging, analysis (the semantics are not consistent).

Report Details

Each ReportJson file type contains a list of PB Dataset UUIDs in the data model. This is used to communicate which DataSets are specific to the input(s) of a specific ReportJSON. Alternatively said, the EntryPoint PB DataSet(s) might not be directly used to compute the ReportJson* datastore file..

Example Jobs

NOTE, the dotted arrow represents the relation between the Report and the source input for the task at the Report JSON level. This is NOT captured at the SMRT Link Server level.

Import DataSet Job

Import DataSet Job

Accessing the Reports and the source DataSet is clearly defined here by only depending on the Job Id.

I believe the Merge DataSet Job type is Similar.

Example Resequencing Job

Analysis Job

Example: Larger Picture of DataFlow in SMRT Link using SMRT Link Jobs

Simple Example

To perform a standard Resequencing Job, the user can run two different import-dataset SMRT Link Jobs, then a pbsmrtpipe (i.e., 'Analysis') SMRT Link Job can be performed.

Steps:

  1. Import SubreadSet
  2. Import ReferenceSet
  3. Run Analysis Job to run the Resequencing Analysis

Import DataSets and Perform Resequencing Analysis

(Each Job type is shown in its own box)

Advanced Example

To demonstrate a larger dataflow example, consider the following case. A user would like to import SubreadSet alpha and beta, perform filtering on beta, merge the datasets, perform a Resequencing analysis on the merged subreadset and export the filtered SubreadSet as a zip.

Steps:

  1. Import ReferenceSet, SubreadSet alpha and Beta
  2. Create a filtered SubreadSet from SubreadSet alpha
  3. Create a Merged SubreadSet from SubreadSet Beta and the output of #2
  4. Create an Analysis Job using #3 and ReferenceSet from #1
  5. Create a DataSet XML(s) ZIP from the output of #3

Advanced Dataflow Example

This demonstrates graph nature of the design and composibility of different SMRT Link Job types. Note that data provenance is for free in the model.

digraph {
subgraph cluster_0 {
c0_Job [shape=hexagon, color=blue, label="Import DataSet"]
c0_store [shape=cylinder, label="DataStore"]
c0_ep1 [shape=diamond, label="Path /path/to/subreadset.xml"]
c0_ep1 -> c0_Job
c0_Job -> c0_store
c0_dsf_02 [shape=tab, label="DataStoreFile (SubreadSet)"]
c0_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c0_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c0_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c0_store -> c0_dsf_02
c0_store -> c0_dsf_04
c0_store -> c0_dsf_05
c0_store -> c0_dsf_03
c0_dsf_02 -> c0_dsf_04 [style=dotted]
c0_dsf_02 -> c0_dsf_05 [style=dotted]
}
subgraph cluster_2 {
c2_Job [shape=hexagon, color=blue, label="Import DataSet"]
c2_store [shape=cylinder, label="DataStore"]
c2_ep1 [shape=diamond, label="Path /path/to/referenceset.xml"]
c2_ep1 -> c2_Job
c2_Job -> c2_store
c2_dsf_02 [shape=tab, label="DataStoreFile (ReferenceSet)"]
c2_dsf_03 [shape=tab, label="DataStoreFile (Log)"]
c2_store -> c2_dsf_02
c2_store -> c2_dsf_03
}
subgraph cluster_02 {
c1_Job [shape=hexagon, color=blue, label="Analysis Job"]
c1_store [shape=cylinder, label="DataStore"]
c1_ep1 [shape=diamond, label="EntryPoint (SubreadSet)"]
c1_ep2 [shape=diamond, label="EntryPoint (ReferenceSet)"]
c1_ep1 -> c1_Job
c1_ep2 -> c1_Job
c1_Job -> c1_store
c1_dsf_01 [shape=tab, label="DataStoreFile (Fasta)"]
c1_dsf_02 [shape=tab, label="DataStoreFile (AlignmentSet)"]
c1_dsf_03 [shape=tab, label="DataStoreFile (VCF)"]
c1_dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
c1_dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
c1_dsf_06 [shape=tab, label="DataStoreFile (LOG)"]
c1_store -> c1_dsf_01
c1_store -> c1_dsf_02
c1_store -> c1_dsf_03
c1_store -> c1_dsf_04
c1_store -> c1_dsf_05
c1_store -> c1_dsf_06
c1_ep1 -> c1_dsf_04 [style=dotted]
c1_dsf_02 -> c1_dsf_05 [style=dotted]
c1_ep2 -> c1_dsf_05 [style=dotted]
}
c2_dsf_02 -> c1_ep2
c0_dsf_02 -> c1_ep1
}
Display the source blob
Display the rendered blob
Raw
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="1290pt" height="652pt"
viewBox="0.00 0.00 1290.00 652.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 648)">
<title>%3</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-648 1286,-648 1286,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_0</title>
<polygon fill="none" stroke="#000000" points="900,-296 900,-636 1274,-636 1274,-296 900,-296"/>
</g>
<g id="clust2" class="cluster">
<title>cluster_2</title>
<polygon fill="none" stroke="#000000" points="198,-368 198,-636 543,-636 543,-368 198,-368"/>
</g>
<g id="clust3" class="cluster">
<title>cluster_02</title>
<polygon fill="none" stroke="#000000" points="8,-8 8,-348 892,-348 892,-8 8,-8"/>
</g>
<!-- c0_Job -->
<g id="node1" class="node">
<title>c0_Job</title>
<polygon fill="none" stroke="#0000ff" points="1177.1819,-538 1140.0909,-556 1065.9091,-556 1028.8181,-538 1065.9091,-520 1140.0909,-520 1177.1819,-538"/>
<text text-anchor="middle" x="1103" y="-533.8" font-family="Times,serif" font-size="14.00" fill="#000000">Import DataSet</text>
</g>
<!-- c0_store -->
<g id="node2" class="node">
<title>c0_store</title>
<path fill="none" stroke="#000000" d="M1138.9795,-480.7273C1138.9795,-482.5331 1122.8529,-484 1103,-484 1083.1471,-484 1067.0205,-482.5331 1067.0205,-480.7273 1067.0205,-480.7273 1067.0205,-451.2727 1067.0205,-451.2727 1067.0205,-449.4669 1083.1471,-448 1103,-448 1122.8529,-448 1138.9795,-449.4669 1138.9795,-451.2727 1138.9795,-451.2727 1138.9795,-480.7273 1138.9795,-480.7273"/>
<path fill="none" stroke="#000000" d="M1138.9795,-480.7273C1138.9795,-478.9214 1122.8529,-477.4545 1103,-477.4545 1083.1471,-477.4545 1067.0205,-478.9214 1067.0205,-480.7273"/>
<text text-anchor="middle" x="1103" y="-461.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStore</text>
</g>
<!-- c0_Job&#45;&gt;c0_store -->
<g id="edge2" class="edge">
<title>c0_Job&#45;&gt;c0_store</title>
<path fill="none" stroke="#000000" d="M1103,-519.8314C1103,-512.131 1103,-502.9743 1103,-494.4166"/>
<polygon fill="#000000" stroke="#000000" points="1106.5001,-494.4132 1103,-484.4133 1099.5001,-494.4133 1106.5001,-494.4132"/>
</g>
<!-- c0_dsf_02 -->
<g id="node4" class="node">
<title>c0_dsf_02</title>
<polygon fill="none" stroke="#000000" points="1078.349,-412 919.651,-412 919.651,-416 907.651,-416 907.651,-376 1078.349,-376 1078.349,-412"/>
<polyline fill="none" stroke="#000000" points="907.651,-412 919.651,-412 "/>
<text text-anchor="middle" x="993" y="-389.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (SubreadSet)</text>
</g>
<!-- c0_store&#45;&gt;c0_dsf_02 -->
<g id="edge3" class="edge">
<title>c0_store&#45;&gt;c0_dsf_02</title>
<path fill="none" stroke="#000000" d="M1076.3721,-448.5708C1062.2337,-439.3166 1044.6507,-427.8077 1029.2818,-417.7481"/>
<polygon fill="#000000" stroke="#000000" points="1030.8153,-414.5688 1020.5315,-412.0206 1026.9817,-420.4257 1030.8153,-414.5688"/>
</g>
<!-- c0_dsf_04 -->
<g id="node5" class="node">
<title>c0_dsf_04</title>
<polygon fill="none" stroke="#000000" points="1074.6868,-340 921.3132,-340 921.3132,-344 909.3132,-344 909.3132,-304 1074.6868,-304 1074.6868,-340"/>
<polyline fill="none" stroke="#000000" points="909.3132,-340 921.3132,-340 "/>
<text text-anchor="middle" x="992" y="-317.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Report_01)</text>
</g>
<!-- c0_store&#45;&gt;c0_dsf_04 -->
<g id="edge4" class="edge">
<title>c0_store&#45;&gt;c0_dsf_04</title>
<path fill="none" stroke="#000000" d="M1104.7475,-447.8294C1105.8121,-426.5822 1104.4186,-391.3536 1087,-368 1080.0515,-358.684 1070.7152,-351.1606 1060.6006,-345.1134"/>
<polygon fill="#000000" stroke="#000000" points="1062.0061,-341.8931 1051.5521,-340.1713 1058.6507,-348.0365 1062.0061,-341.8931"/>
</g>
<!-- c0_dsf_05 -->
<g id="node6" class="node">
<title>c0_dsf_05</title>
<polygon fill="none" stroke="#000000" points="1258.6868,-340 1105.3132,-340 1105.3132,-344 1093.3132,-344 1093.3132,-304 1258.6868,-304 1258.6868,-340"/>
<polyline fill="none" stroke="#000000" points="1093.3132,-340 1105.3132,-340 "/>
<text text-anchor="middle" x="1176" y="-317.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Report_02)</text>
</g>
<!-- c0_store&#45;&gt;c0_dsf_05 -->
<g id="edge5" class="edge">
<title>c0_store&#45;&gt;c0_dsf_05</title>
<path fill="none" stroke="#000000" d="M1104.3363,-447.8609C1106.5294,-427.2682 1112.2479,-393.2607 1127,-368 1131.4408,-360.3959 1137.5217,-353.2775 1143.8898,-347.0192"/>
<polygon fill="#000000" stroke="#000000" points="1146.4299,-349.4381 1151.4163,-340.0901 1141.6887,-344.2882 1146.4299,-349.4381"/>
</g>
<!-- c0_dsf_03 -->
<g id="node7" class="node">
<title>c0_dsf_03</title>
<polygon fill="none" stroke="#000000" points="1265.6348,-412 1148.3652,-412 1148.3652,-416 1136.3652,-416 1136.3652,-376 1265.6348,-376 1265.6348,-412"/>
<polyline fill="none" stroke="#000000" points="1136.3652,-412 1148.3652,-412 "/>
<text text-anchor="middle" x="1201" y="-389.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Log)</text>
</g>
<!-- c0_store&#45;&gt;c0_dsf_03 -->
<g id="edge6" class="edge">
<title>c0_store&#45;&gt;c0_dsf_03</title>
<path fill="none" stroke="#000000" d="M1127.2247,-448.2022C1139.6166,-439.098 1154.8772,-427.8861 1168.2973,-418.0265"/>
<polygon fill="#000000" stroke="#000000" points="1170.3867,-420.8345 1176.3733,-412.0931 1166.2421,-415.1933 1170.3867,-420.8345"/>
</g>
<!-- c0_ep1 -->
<g id="node3" class="node">
<title>c0_ep1</title>
<polygon fill="none" stroke="#000000" points="1103,-628 946.084,-610 1103,-592 1259.916,-610 1103,-628"/>
<text text-anchor="middle" x="1103" y="-605.8" font-family="Times,serif" font-size="14.00" fill="#000000">Path /path/to/subreadset.xml</text>
</g>
<!-- c0_ep1&#45;&gt;c0_Job -->
<g id="edge1" class="edge">
<title>c0_ep1&#45;&gt;c0_Job</title>
<path fill="none" stroke="#000000" d="M1103,-591.8314C1103,-584.131 1103,-574.9743 1103,-566.4166"/>
<polygon fill="#000000" stroke="#000000" points="1106.5001,-566.4132 1103,-556.4133 1099.5001,-566.4133 1106.5001,-566.4132"/>
</g>
<!-- c0_dsf_02&#45;&gt;c0_dsf_04 -->
<g id="edge7" class="edge">
<title>c0_dsf_02&#45;&gt;c0_dsf_04</title>
<path fill="none" stroke="#000000" stroke-dasharray="1,5" d="M992.7477,-375.8314C992.6407,-368.131 992.5135,-358.9743 992.3947,-350.4166"/>
<polygon fill="#000000" stroke="#000000" points="995.8944,-350.3637 992.2557,-340.4133 988.8951,-350.4609 995.8944,-350.3637"/>
</g>
<!-- c0_dsf_02&#45;&gt;c0_dsf_05 -->
<g id="edge8" class="edge">
<title>c0_dsf_02&#45;&gt;c0_dsf_05</title>
<path fill="none" stroke="#000000" stroke-dasharray="1,5" d="M1039.1786,-375.8314C1063.9801,-366.0734 1094.725,-353.977 1120.7843,-343.7242"/>
<polygon fill="#000000" stroke="#000000" points="1122.1292,-346.9563 1130.1534,-340.038 1119.5663,-340.4423 1122.1292,-346.9563"/>
</g>
<!-- c1_ep1 -->
<g id="node15" class="node">
<title>c1_ep1</title>
<polygon fill="none" stroke="#000000" points="747,-340 610.1801,-322 747,-304 883.8199,-322 747,-340"/>
<text text-anchor="middle" x="747" y="-317.8" font-family="Times,serif" font-size="14.00" fill="#000000">EntryPoint (SubreadSet)</text>
</g>
<!-- c0_dsf_02&#45;&gt;c1_ep1 -->
<g id="edge26" class="edge">
<title>c0_dsf_02&#45;&gt;c1_ep1</title>
<path fill="none" stroke="#000000" d="M931.2413,-375.9243C890.7401,-364.0703 838.3452,-348.7352 799.7891,-337.4505"/>
<polygon fill="#000000" stroke="#000000" points="800.3833,-333.9776 789.8027,-334.5276 798.4169,-340.6958 800.3833,-333.9776"/>
</g>
<!-- c2_Job -->
<g id="node8" class="node">
<title>c2_Job</title>
<polygon fill="none" stroke="#0000ff" points="444.1819,-538 407.0909,-556 332.9091,-556 295.8181,-538 332.9091,-520 407.0909,-520 444.1819,-538"/>
<text text-anchor="middle" x="370" y="-533.8" font-family="Times,serif" font-size="14.00" fill="#000000">Import DataSet</text>
</g>
<!-- c2_store -->
<g id="node9" class="node">
<title>c2_store</title>
<path fill="none" stroke="#000000" d="M405.9795,-480.7273C405.9795,-482.5331 389.8529,-484 370,-484 350.1471,-484 334.0205,-482.5331 334.0205,-480.7273 334.0205,-480.7273 334.0205,-451.2727 334.0205,-451.2727 334.0205,-449.4669 350.1471,-448 370,-448 389.8529,-448 405.9795,-449.4669 405.9795,-451.2727 405.9795,-451.2727 405.9795,-480.7273 405.9795,-480.7273"/>
<path fill="none" stroke="#000000" d="M405.9795,-480.7273C405.9795,-478.9214 389.8529,-477.4545 370,-477.4545 350.1471,-477.4545 334.0205,-478.9214 334.0205,-480.7273"/>
<text text-anchor="middle" x="370" y="-461.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStore</text>
</g>
<!-- c2_Job&#45;&gt;c2_store -->
<g id="edge10" class="edge">
<title>c2_Job&#45;&gt;c2_store</title>
<path fill="none" stroke="#000000" d="M370,-519.8314C370,-512.131 370,-502.9743 370,-494.4166"/>
<polygon fill="#000000" stroke="#000000" points="373.5001,-494.4132 370,-484.4133 366.5001,-494.4133 373.5001,-494.4132"/>
</g>
<!-- c2_dsf_02 -->
<g id="node11" class="node">
<title>c2_dsf_02</title>
<polygon fill="none" stroke="#000000" points="387.7042,-412 218.2958,-412 218.2958,-416 206.2958,-416 206.2958,-376 387.7042,-376 387.7042,-412"/>
<polyline fill="none" stroke="#000000" points="206.2958,-412 218.2958,-412 "/>
<text text-anchor="middle" x="297" y="-389.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (ReferenceSet)</text>
</g>
<!-- c2_store&#45;&gt;c2_dsf_02 -->
<g id="edge11" class="edge">
<title>c2_store&#45;&gt;c2_dsf_02</title>
<path fill="none" stroke="#000000" d="M351.579,-447.8314C342.8471,-439.219 332.2683,-428.7851 322.7577,-419.4048"/>
<polygon fill="#000000" stroke="#000000" points="324.9925,-416.6931 315.4151,-412.1628 320.077,-421.6769 324.9925,-416.6931"/>
</g>
<!-- c2_dsf_03 -->
<g id="node12" class="node">
<title>c2_dsf_03</title>
<polygon fill="none" stroke="#000000" points="534.6348,-412 417.3652,-412 417.3652,-416 405.3652,-416 405.3652,-376 534.6348,-376 534.6348,-412"/>
<polyline fill="none" stroke="#000000" points="405.3652,-412 417.3652,-412 "/>
<text text-anchor="middle" x="470" y="-389.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Log)</text>
</g>
<!-- c2_store&#45;&gt;c2_dsf_03 -->
<g id="edge12" class="edge">
<title>c2_store&#45;&gt;c2_dsf_03</title>
<path fill="none" stroke="#000000" d="M394.4628,-448.3868C407.1041,-439.2851 422.7224,-428.0399 436.4705,-418.1412"/>
<polygon fill="#000000" stroke="#000000" points="438.6753,-420.8666 444.7456,-412.1832 434.5851,-415.1859 438.6753,-420.8666"/>
</g>
<!-- c2_ep1 -->
<g id="node10" class="node">
<title>c2_ep1</title>
<polygon fill="none" stroke="#000000" points="370,-628 205.8661,-610 370,-592 534.1339,-610 370,-628"/>
<text text-anchor="middle" x="370" y="-605.8" font-family="Times,serif" font-size="14.00" fill="#000000">Path /path/to/referenceset.xml</text>
</g>
<!-- c2_ep1&#45;&gt;c2_Job -->
<g id="edge9" class="edge">
<title>c2_ep1&#45;&gt;c2_Job</title>
<path fill="none" stroke="#000000" d="M370,-591.8314C370,-584.131 370,-574.9743 370,-566.4166"/>
<polygon fill="#000000" stroke="#000000" points="373.5001,-566.4132 370,-556.4133 366.5001,-566.4133 373.5001,-566.4132"/>
</g>
<!-- c1_ep2 -->
<g id="node16" class="node">
<title>c1_ep2</title>
<polygon fill="none" stroke="#000000" points="297,-340 150.2852,-322 297,-304 443.7148,-322 297,-340"/>
<text text-anchor="middle" x="297" y="-317.8" font-family="Times,serif" font-size="14.00" fill="#000000">EntryPoint (ReferenceSet)</text>
</g>
<!-- c2_dsf_02&#45;&gt;c1_ep2 -->
<g id="edge25" class="edge">
<title>c2_dsf_02&#45;&gt;c1_ep2</title>
<path fill="none" stroke="#000000" d="M297,-375.8314C297,-368.131 297,-358.9743 297,-350.4166"/>
<polygon fill="#000000" stroke="#000000" points="300.5001,-350.4132 297,-340.4133 293.5001,-350.4133 300.5001,-350.4132"/>
</g>
<!-- c1_Job -->
<g id="node13" class="node">
<title>c1_Job</title>
<polygon fill="none" stroke="#0000ff" points="418.2987,-250 386.1493,-268 321.8507,-268 289.7013,-250 321.8507,-232 386.1493,-232 418.2987,-250"/>
<text text-anchor="middle" x="354" y="-245.8" font-family="Times,serif" font-size="14.00" fill="#000000">Analysis Job</text>
</g>
<!-- c1_store -->
<g id="node14" class="node">
<title>c1_store</title>
<path fill="none" stroke="#000000" d="M389.9795,-192.7273C389.9795,-194.5331 373.8529,-196 354,-196 334.1471,-196 318.0205,-194.5331 318.0205,-192.7273 318.0205,-192.7273 318.0205,-163.2727 318.0205,-163.2727 318.0205,-161.4669 334.1471,-160 354,-160 373.8529,-160 389.9795,-161.4669 389.9795,-163.2727 389.9795,-163.2727 389.9795,-192.7273 389.9795,-192.7273"/>
<path fill="none" stroke="#000000" d="M389.9795,-192.7273C389.9795,-190.9214 373.8529,-189.4545 354,-189.4545 334.1471,-189.4545 318.0205,-190.9214 318.0205,-192.7273"/>
<text text-anchor="middle" x="354" y="-173.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStore</text>
</g>
<!-- c1_Job&#45;&gt;c1_store -->
<g id="edge15" class="edge">
<title>c1_Job&#45;&gt;c1_store</title>
<path fill="none" stroke="#000000" d="M354,-231.8314C354,-224.131 354,-214.9743 354,-206.4166"/>
<polygon fill="#000000" stroke="#000000" points="357.5001,-206.4132 354,-196.4133 350.5001,-206.4133 357.5001,-206.4132"/>
</g>
<!-- c1_dsf_01 -->
<g id="node17" class="node">
<title>c1_dsf_01</title>
<polygon fill="none" stroke="#000000" points="172.1348,-124 47.8652,-124 47.8652,-128 35.8652,-128 35.8652,-88 172.1348,-88 172.1348,-124"/>
<polyline fill="none" stroke="#000000" points="35.8652,-124 47.8652,-124 "/>
<text text-anchor="middle" x="104" y="-101.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Fasta)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_01 -->
<g id="edge16" class="edge">
<title>c1_store&#45;&gt;c1_dsf_01</title>
<path fill="none" stroke="#000000" d="M317.9148,-167.6075C281.0526,-156.9911 222.801,-140.2147 176.3873,-126.8475"/>
<polygon fill="#000000" stroke="#000000" points="177.2528,-123.4546 166.6747,-124.0503 175.3155,-130.1812 177.2528,-123.4546"/>
</g>
<!-- c1_dsf_02 -->
<g id="node18" class="node">
<title>c1_dsf_02</title>
<polygon fill="none" stroke="#000000" points="700.3558,-124 527.6442,-124 527.6442,-128 515.6442,-128 515.6442,-88 700.3558,-88 700.3558,-124"/>
<polyline fill="none" stroke="#000000" points="515.6442,-124 527.6442,-124 "/>
<text text-anchor="middle" x="608" y="-101.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (AlignmentSet)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_02 -->
<g id="edge17" class="edge">
<title>c1_store&#45;&gt;c1_dsf_02</title>
<path fill="none" stroke="#000000" d="M390.1223,-167.7606C427.4596,-157.1768 486.8038,-140.3548 534.1354,-126.938"/>
<polygon fill="#000000" stroke="#000000" points="535.3745,-130.2247 544.0409,-124.1301 533.4654,-123.4901 535.3745,-130.2247"/>
</g>
<!-- c1_dsf_03 -->
<g id="node19" class="node">
<title>c1_dsf_03</title>
<polygon fill="none" stroke="#000000" points="497.8175,-124 376.1825,-124 376.1825,-128 364.1825,-128 364.1825,-88 497.8175,-88 497.8175,-124"/>
<polyline fill="none" stroke="#000000" points="364.1825,-124 376.1825,-124 "/>
<text text-anchor="middle" x="431" y="-101.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (VCF)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_03 -->
<g id="edge18" class="edge">
<title>c1_store&#45;&gt;c1_dsf_03</title>
<path fill="none" stroke="#000000" d="M373.4304,-159.8314C382.732,-151.1337 394.0204,-140.5783 404.1286,-131.1265"/>
<polygon fill="#000000" stroke="#000000" points="406.6621,-133.5493 411.5759,-124.1628 401.8811,-128.4363 406.6621,-133.5493"/>
</g>
<!-- c1_dsf_04 -->
<g id="node20" class="node">
<title>c1_dsf_04</title>
<polygon fill="none" stroke="#000000" points="883.6868,-124 730.3132,-124 730.3132,-128 718.3132,-128 718.3132,-88 883.6868,-88 883.6868,-124"/>
<polyline fill="none" stroke="#000000" points="718.3132,-124 730.3132,-124 "/>
<text text-anchor="middle" x="801" y="-101.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Report_01)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_04 -->
<g id="edge19" class="edge">
<title>c1_store&#45;&gt;c1_dsf_04</title>
<path fill="none" stroke="#000000" d="M390.1117,-173.0212C452.6812,-164.283 585.5162,-145.2413 708.3248,-124.0796"/>
<polygon fill="#000000" stroke="#000000" points="709.0224,-127.5109 718.2785,-122.3558 707.8279,-120.6136 709.0224,-127.5109"/>
</g>
<!-- c1_dsf_05 -->
<g id="node21" class="node">
<title>c1_dsf_05</title>
<polygon fill="none" stroke="#000000" points="273.6868,-52 120.3132,-52 120.3132,-56 108.3132,-56 108.3132,-16 273.6868,-16 273.6868,-52"/>
<polyline fill="none" stroke="#000000" points="108.3132,-52 120.3132,-52 "/>
<text text-anchor="middle" x="191" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (Report_02)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_05 -->
<g id="edge20" class="edge">
<title>c1_store&#45;&gt;c1_dsf_05</title>
<path fill="none" stroke="#000000" d="M317.768,-169.9006C277.3724,-160.0995 215.7572,-142.5856 201,-124 187.3901,-106.8594 185.6255,-81.7447 186.8478,-62.3982"/>
<polygon fill="#000000" stroke="#000000" points="190.343,-62.6105 187.7778,-52.3309 183.3727,-61.9665 190.343,-62.6105"/>
</g>
<!-- c1_dsf_06 -->
<g id="node22" class="node">
<title>c1_dsf_06</title>
<polygon fill="none" stroke="#000000" points="345.8559,-124 222.1441,-124 222.1441,-128 210.1441,-128 210.1441,-88 345.8559,-88 345.8559,-124"/>
<polyline fill="none" stroke="#000000" points="210.1441,-124 222.1441,-124 "/>
<text text-anchor="middle" x="278" y="-101.8" font-family="Times,serif" font-size="14.00" fill="#000000">DataStoreFile (LOG)</text>
</g>
<!-- c1_store&#45;&gt;c1_dsf_06 -->
<g id="edge21" class="edge">
<title>c1_store&#45;&gt;c1_dsf_06</title>
<path fill="none" stroke="#000000" d="M334.822,-159.8314C325.6412,-151.1337 314.4994,-140.5783 304.5224,-131.1265"/>
<polygon fill="#000000" stroke="#000000" points="306.8385,-128.4995 297.1719,-124.1628 302.0243,-133.5811 306.8385,-128.4995"/>
</g>
<!-- c1_ep1&#45;&gt;c1_Job -->
<g id="edge13" class="edge">
<title>c1_ep1&#45;&gt;c1_Job</title>
<path fill="none" stroke="#000000" d="M689.4335,-311.4535C615.4159,-297.893 487.5519,-274.4675 412.896,-260.7901"/>
<polygon fill="#000000" stroke="#000000" points="413.1931,-257.2864 402.726,-258.9269 411.9316,-264.1718 413.1931,-257.2864"/>
</g>
<!-- c1_ep1&#45;&gt;c1_dsf_04 -->
<g id="edge22" class="edge">
<title>c1_ep1&#45;&gt;c1_dsf_04</title>
<path fill="none" stroke="#000000" stroke-dasharray="1,5" d="M751.419,-304.3238C760.7899,-266.8404 782.7194,-179.1223 794.0098,-133.9606"/>
<polygon fill="#000000" stroke="#000000" points="797.4137,-134.7759 796.4436,-124.2255 790.6227,-133.0781 797.4137,-134.7759"/>
</g>
<!-- c1_ep2&#45;&gt;c1_Job -->
<g id="edge14" class="edge">
<title>c1_ep2&#45;&gt;c1_Job</title>
<path fill="none" stroke="#000000" d="M310.2202,-305.3008C317.106,-296.6029 325.6814,-285.7709 333.4129,-276.0048"/>
<polygon fill="#000000" stroke="#000000" points="336.1904,-278.1351 339.6533,-268.1222 330.7021,-273.7902 336.1904,-278.1351"/>
</g>
<!-- c1_ep2&#45;&gt;c1_dsf_05 -->
<g id="edge24" class="edge">
<title>c1_ep2&#45;&gt;c1_dsf_05</title>
<path fill="none" stroke="#000000" stroke-dasharray="1,5" d="M260.9154,-308.3479C199.5068,-282.8937 76.5282,-221.7393 27,-124 19.7677,-109.7278 17.6209,-100.9627 27,-88 36.3556,-75.0698 66.3032,-63.5534 98.0934,-54.5117"/>
<polygon fill="#000000" stroke="#000000" points="99.2847,-57.8141 107.9977,-51.7863 97.4275,-51.065 99.2847,-57.8141"/>
</g>
<!-- c1_dsf_02&#45;&gt;c1_dsf_05 -->
<g id="edge23" class="edge">
<title>c1_dsf_02&#45;&gt;c1_dsf_05</title>
<path fill="none" stroke="#000000" stroke-dasharray="1,5" d="M515.8063,-89.5416C512.8365,-89.0203 509.895,-88.5053 507,-88 431.9277,-74.8961 346.7615,-60.3616 283.7756,-49.6755"/>
<polygon fill="#000000" stroke="#000000" points="284.3066,-46.2157 273.8621,-47.9943 283.1361,-53.1172 284.3066,-46.2157"/>
</g>
</g>
</svg>

Notes on SMRT Link DataSet, Job, DataStoreFile, Report models

Current Model for SMRT Link 'Job' model

Simplify, the general interface of a SMRT Link Job, for type T,

A Job takes T as input and produces a PB (T -> Job -> DataStore)

List of EntryPoint PB DataSet -> Job -> DataStore

A DataStore is a list of DataStore files.

Each DataStoreFile can be a different file types, such as, PB DataSet, VCF, ReportJSON, Fasta, etc... and also contains the specific ob id and UUID that generated the DataStoreFile.

During and after SMRT Link Job execution, the DataStoreFiles will be imorted into the db, the DataStoreFile. For a specific subset of file types (PB DataSet types), additional metadata will be stored in the SMRT Link database. Each DataSet has metadata about the specific dataset type as well as metadata about a possible 'parent' DataSet. The DataSet 'parentage' can be a result from copying, merging, analysis (the semantics are not consistent).

Report Details

Each ReportJson file type contains a list of PB Dataset UUIDs in the data model. This is used to communicate which DataSets are specific to the input(s) of a specific ReportJSON. Alternatively said, the EntryPoint PB DataSet(s) might not be directly used to compute the ReportJson* datastore file..

NOTE This is the core issue. Currently the system only communicates the DataSet Job Id

Example Jobs

NOTE, the dotted arrow represents the relation between the Report and the source input for the task at the Report JSON level. This is NOT captured at the SMRT Link Server level.

Import DataSet Job

Import DataSet Job

Accessing the Reports and the source DataSet is clearly defined here by only depending on the Job Id.

I believe the Merge DataSet Job type is Similar.

Example Resequencing Job

Analysis Job

The use case here is often to view all the output'ed Reports and links back to the source DataSet is not necessary.

However, viewing the AlignmentSet in DM will start yield unexpected results. This is why the SMRT Link has a workaround to filter all the Reports from the Job and only show the DataSets where for the AlignmentSet in interest. This works for small number of Reports, but DOES NOT work for an Job that outputs "many" reports (because of the explicit filtering necessary on the client side).

Example Demux Analysis Job

Demux Analysis Job

The model is more involved when N DataSets and N (or more) companion reports per DataSet are emitted.

The core issue is that a specific DataSet of the N will return

Job Output Access Points

From the file system access point, the DataStore* is accessible in the datastore.json file in the SMRT Link Job directory (the path of this is not consistent, but it's often in the root directory of the Job).

From the webservices, the datastore files are accessible from DM where DS-TYPE' is the DataSet type 'short name' (e.g., subreads) and DS-IDABLE is the DataSet (local) integer id, or (global) UUID.

smrt-link/datasets/<DS-TYPE>/<DS-IDABLE>/reports

NOTE The reports interface is the core issue because it assumes the Job Id link. I believe the SMRT Link UI is filtering to get around this, however, this is not scalable because the SMRT Link UI has to fetch the details of all the reports, then filter out based on the DataSet UUID in the report.

And from the Jobs context.

smrt-link/job-manager/jobs/<JOB-TYPE>/<JOB-IDABLE>/reports

The Job is not a problem and the interface does NOT need to be changed. Semantically, the interface captures exactly what is expected.

Possible Solution

Capture new Report -> DataSet(s) relation

  • Add new table to capture DataStoreFile -> Set(DataStoreFile) relation
  • On import parse Report and assign
  • Update /smrt-link/<DS-TYPE>/<DS-IDABLE>/reports to also filter by 'parent' DataStore File UUID(s)

This is straight forward, but this now requires a join to get the reports for a specific DataSet.

Legacy Data

Need to handle legacy data, specifically for the smrt-link/datasets/<DS-TYPE>/<DS-IDABLE>/reports webservice endpoint.

Possible Solutions

  1. During db migration on "start/upgrade", parse the Report JSON files and extract the DataSet UUID from the Report JSON file on disk and update the database.

    • BAD. This is expensive (could be parsing 1000's of report JSON files)
    • BAD. The dataset_uuids field might not be populated consistently. Not clear when that was added in the SMRT Link version.
    • GOOD. Potentially parsing the raw data removes guesswork at the job level (See #3 and the above issue is resolved)
  2. Hide the details in the API and dispatch on lookup based on the Job Version. (i.e., if job > 6.0.0, do X else do Y to get reports)

    • BAD. Fundamentally has different semantic results.
    • This is probably very difficult to debug when it's not working as expected. The dispatch from different SL versions would requires an extra join to the engine_jobs table.
    • GOOD. Potentially least amount of db migration machinery
  3. During db migration, attempt a thinner approach to migrate old data and assign Report -> DataSet relation based on the Job type

    • import-dataset (Look at the output DataStore file, get the (single) DataSet DataStoreFile, get the List of Reports, then update the DB
    • merge-datasets (Similar to import-dataset)
    • analysis job (Use the Entry Point(s) to get the UUID(s) and assign to all output Reports) (Note, this is not correct. The SL UI would still have to keep the legacy filtering model in place)
    • Other job types (Don't support?)

GOOD. Thin-ish migration BAD. Edge cases on capturing the

strict digraph {
Job [shape=hexagon, color=blue, label="Demux Analysis Job"]
store [shape=cylinder, label="DataStore"]
ep1 [shape=diamond, label="EntryPoint (SubreadSet)"]
ep2 [shape=diamond, label="EntryPoint (BarcodeSet)"]
ep1 -> Job
ep2 -> Job
Job -> store
s1 [shape=tab, label="DataStoreFile (SubreadSet_01)"]
s2 [shape=tab, label="DataStoreFile (SubreadSet_02)"]
s3 [shape=tab, label="DataStoreFile (SubreadSet_03)"]
r1 [shape=tab, label="DataStoreFile (Report_01)"]
r2 [shape=tab, label="DataStoreFile (Report_02)"]
r3 [shape=tab, label="DataStoreFile (Report_03)"]
dsf_01 [shape=tab, label="DataStoreFile (Log)"]
store -> dsf_01
store -> s1
store -> s2
store -> s3
store -> r1
store -> r2
store -> r3
s1 -> r1 [style=dotted]
s2 -> r2 [style=dotted]
s3 -> r3 [style=dotted]
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
strict digraph {
Run [shape=component, label="PacBio Run (XML)"]
c1 [shape=box, label="CollectionMetadata 1"]
c2 [shape=box, label="CollectionMetadata 2"]
c3 [shape=box, label="CollectionMetadata 3"]
c4 [shape=box, label="CollectionMetadata 4"]
Run -> c1
Run -> c2
Run -> c3
Run -> c4
p [shape=parallelogram, color=blue, label="Primary Analysis: Convert CollectionMeta to SubreadSet)"]
f1 [shape=tab, label="PA File (SubreadSet XML) on PA file system"]
copy_job [shape=parallelogram, color=blue, label="Primary Analysis: Copy to Customer FileSystem"]
customer_subreadset [shape=tab, label="SubreadSet XML on Customer FileSystem"]
import_job [shape=parallelogram, color=blue, label="Primary Analysis: Import SubreadSet XML into SMRT Link using import-dataset Job"]
c1 -> p
p -> f1
f1 -> copy_job
copy_job -> customer_subreadset
customer_subreadset -> import_job
}
strict digraph {
Job [shape=hexagon, color=blue, label="Import DataSet"]
store [shape=cylinder, label="DataStore"]
ep1 [shape=diamond, label="Path /path/to/subreadset.xml"]
ep1 -> Job
Job -> store
dsf_02 [shape=tab, label="DataStoreFile (SubreadSet)"]
dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
dsf_03 [shape=tab, label="DataStoreFile (Log)"]
store -> dsf_02
store -> dsf_04
store -> dsf_05
store -> dsf_03
dsf_02 -> dsf_04 [style=dotted]
dsf_02 -> dsf_05 [style=dotted]
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
default: convert
convert:
dot -Tpng demux-pbsmrtpipe-job.dot -o demux-pbsmrtpipe-job.png
dot -Tsvg demux-pbsmrtpipe-job.dot -o demux-pbsmrtpipe-job.svg
dot -Tpng import-dataset-job.dot -o import-dataset-job.png
dot -Tsvg import-dataset-job.dot -o import-dataset-job.svg
dot -Tpng pbsmrtpipe-job.dot -o pbsmrtpipe-job.png
dot -Tsvg pbsmrtpipe-job.dot -o pbsmrtpipe-job.svg
dot -Tpng connected-jobs.dot -o connected-jobs.png
dot -Tsvg connected-jobs.dot -o connected-jobs.svg
dot -Tpng advanced-jobs.dot -o advanced-jobs.png
dot -Tsvg advanced-jobs.dot -o advanced-jobs.svg
dot -Tpng ics.dot -o ics.png
dot -Tsvg system-job-running.dot -o system-job-running.svg
dot -Tpng system-job-running.dot -o system-job-running.png
clean:
rm *.png
strict digraph {
Job [shape=hexagon, color=blue, label="Analysis Job"]
store [shape=cylinder, label="DataStore"]
ep1 [shape=diamond, label="EntryPoint (SubreadSet)"]
ep2 [shape=diamond, label="EntryPoint (ReferenceSet)"]
ep1 -> Job
ep2 -> Job
Job -> store
dsf_01 [shape=tab, label="DataStoreFile (Fasta)"]
dsf_02 [shape=tab, label="DataStoreFile (AlignmentSet)"]
dsf_03 [shape=tab, label="DataStoreFile (VCF)"]
dsf_04 [shape=tab, label="DataStoreFile (Report_01)"]
dsf_05 [shape=tab, label="DataStoreFile (Report_02)"]
dsf_06 [shape=tab, label="DataStoreFile (LOG)"]
store -> dsf_01
store -> dsf_02
store -> dsf_03
store -> dsf_04
store -> dsf_05
store -> dsf_06
ep1 -> dsf_04 [style=dotted]
dsf_02 -> dsf_05 [style=dotted]
ep2 -> dsf_05 [style=dotted]
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
digraph {
postgres_db [shape=cylinder];
WorkerA [shape=diamond, color=blue];
WorkerB [shape=diamond, color=blue];
WorkerC [shape=diamond, color=blue];
WorkerD [shape=pentagon, color=green];
WorkerE [shape=pentagon, color=orange];
pbsmrtpipeJobA [shape=rectangle];
pbsmrtpipeJobB [shape=rectangle]
pbsmrtpipeJobC [shape=rectangle]
postgres_db -> SL_Services
SL_Services -> WorkerA
SL_Services -> WorkerB
SL_Services -> WorkerC
SL_Services -> WorkerD
SL_Services -> WorkerE
WorkerA -> pbsmrtpipeJobA
WorkerB -> pbsmrtpipeJobB
WorkerC -> pbsmrtpipeJobC
WorkerD -> backup_db
WorkerE -> import_dataset
pbsmrtpipeJobA -> taskA_01
pbsmrtpipeJobA -> taskA_02
pbsmrtpipeJobA -> taskA_03
pbsmrtpipeJobA -> taskA_04
pbsmrtpipeJobA -> taskA_05
taskA_01 -> sge_sync_job_A_01
taskA_02 -> sge_sync_job_A_02
taskA_03 -> sge_sync_job_A_03
taskA_04 -> sge_sync_job_A_04
taskA_05 -> sge_sync_job_A_05
pbsmrtpipeJobB -> taskB_01
pbsmrtpipeJobB -> taskB_02
taskB_01 -> sge_sync_job_B_01
taskB_02 -> sge_sync_job_B_02
pbsmrtpipeJobC -> taskC_01
pbsmrtpipeJobC -> taskC_02
taskC_01 -> sge_sync_job_C_01
taskC_02 -> sge_sync_job_C_02
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment