-
Centralized "health" monitoring across multiple systems for PA and SA
- Primary Analysis for all pac-alphaX systems
- monitoring acquisitions success/failure
- monitoring of transfer jobs success/failure, runtime
- Secondary Analysis for all SL systems (e.g, smrtlink-beta, smrtlink-release)
- monitor smrtlink analysis jobs failure, success, runtime
- monitor smrtlink analysis tasks jobs, failure
- central place for all report metrics from all analysis jobs
- monitor smrtlink import-dataset and merge-dataset jobs for successful, failure
- cluster (SGE) monitor of job state
- Primary Analysis for all pac-alphaX systems
-
Centralized datastore across multiple systems
- a. PA/SA LIMS to SubreadSet lookup
- b. LIMS to Analysis jobs (i.e., list of all analysis jobs that use that subreadset)
- import-dataset metrics, such as loading, filtering as used in RunQC
- pbsmrtpipe/analysis core metric results (e.g, number of mapped reads, mean accuracy)
-
Public User interface for common functionality leveraging searchkit. Very Minimal UI
- a. Runcode -> SubreadSet lookup by runcode, experiment-id, project (leveraging 2a)
- b. Runcode -> Analysis Jobs and Metrics (leveraging 2a + 2b)
- c. For other data slicing/query, use kibana directly
- fundamental data sources of are defined as PA/SA webservices.
- "core" data stored in ES is pulled from PA/SA services and stored as documents (i.e., indexes, tables in SQL) in ES. These documents are specifically de-normalized structure that allows efficient queries. Join keys, such as the
job_id
, oracq_id
can be used to query data in other tables. - "derived" document types are built from "core" documents and other sources. This docs are even more app-specific structure to enable specific applications.
"Core" tools that scrape the webservices, de-normalized the data to add necessary context (if necessary) and import into ES. All tools have a mode that enables them to be executed periodically via --periodic-time
commandline option. They are currently running using this mode in login14-bifx01
import-alpha-acqs.py
Scrapes PAWS a single pac-alpha{X} (or all systems) for all acquisitions.import-alpha-transfers.py
Scrapes DataTransfer Services on single pac-alpha{X} (or all systems) for all data transfer jobs.import-smrtlink-analysis-jobs.py
Scrapes SMRT Link/Analysis Services for pbsmrtpipe jobs. Also imports all task details and report metrics from job output. This creates several indexes/tables and requires filesystem access (for task details)import-pbi-collections.py
Scrapes /pbi/collections for alllims.yml
files that have Runcode -> SubreadSet UUID metadata. De-normalizes some of the SubreadSet to enable common queries.import-sge-jobs.py
Snapshots of SGE jobs for monitoring purposes
TODO (This is an attempt to potentially replace Allen's siv1/Sequel page)
-
build-siv.py
(need better name) Creates an index/table of SubreadSet UUID (or LIMS Runcode) -> List of Analysis jobs and metrics.
- TODO Add analyzing Run QC metrics from import-dataset jobs
-
- replace MJ's nibbler? (or does #1 do this?)
Each "core" importing tool creates one or more indices (i.e., table) and document types within ES.
The versioning model is:
- doc_type -> {id}{version} where name is the id of the index and version is v{X} where X is an int. Example smrtlink_analysis_job_v2
- the companion table/index in ES is {id}s{version}. Example smrtlink_analysis_jobs_v2
When the schema changes that doc version and file index should be incremented. Concurrent the new version of the doc, the old (n - 1) version of the doc should also be imported to not break derived collections and UI apps using these collections. This enables some delta of time where the components can be upgraded to use a newer doc model.
TODO. The naming convention isn't consistent in the current dev server
Get List of LimsSubreadSet records
http://login14-biofx01:9200/lims_subreadsets_v2/_search?size=1000&from=0
Get Lims SubreadSet by SubreadSet UUID (primary key)
http://login14-biofx01:9200/lims_subreadsets_v2/lims_subreadset_v2/160ee152-e37d-415e-abbe-9980fd575c68
The general form is:
{index}/{doc-type}/{primary-id}
Hostname (all on Port 8081)
-
smrtlink-alpha
-
smrtlink-alpha-nightly
-
smrtlink-beta (most heavily used)
-
smrtlink-beta-incremental (useful for testing recent build)
-
smrtlink-beta-nightly (useful for testing)
-
smrtlink-bihourly (useful for testing)
-
smrtlink-release (?)
-
smrtlink-siv
-
smrtlink-siv-alpha (?)
-
up on login14-bifx01, ES on port 9200, kibana on 8099
-
Use ssh tunnel if VPN isn't working
ssh -L 8099:localhost:8099 mkocher@login14-biofx01
- Open browser locally to localhost:8099
-
Using ES 2.3.2 pulled from here Extract and run /bin/elasticsearch (dont run this the cluster without setting the config to single node)
-
Using Kibana here by OS Extract and run bin/kibana
Somewhat noteworth changes
- changed ES config to use single node
- changed kibana default port to 8099