NGB Genome Browser Docker End-to-End Demo Script

The NGB Genome Browser is a web-based NGS data viewer with structural variations (SVs) visualization capabilities. This gist provides end-to-end Docker installation instructions and a demos script. This is an e2e version including downloading the sample VCF and BAM files.

Note, these instructions are derived from the following sources:

Note, you can use the hosted version of the NGB Genome Browser at NGB Genome Browser (Hosted). As well, you can view the visualizations in the paper Prioritisation of Structural Variant Calls in Cancer Genomes.

Build Docker Source Packages and binaries

This section is an abridged version of the NGB Quick Start.

Pre-requisite, ensure you have already installed Docker

Let's validate you have docker installed

# Validate you have docker installed
docker --version

Next, clone the NGB sources

# Get NGB sources
git clone https://github.com/epam/NGB.git
cd NGB

Build source packages and binaries (takes about 15min to finish)

# Build source packages and binaries
./gradlew buildDocker

Once installed, the images should be part of your local repository:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ngb                 latest              8491b3ab993e        5 days ago          548MB
ubuntu              16.04               2a4cca5ac898        10 days ago         111MB

Download NGB Demo Data

You can use your own data, but to get yourself up and running faster you can utilize the EPAM NGB Demo Data or EPAM NGB Demo Data (TAR) and scripts. The key script that you can follow is demo_data_init.sh. Below is an abridged version of the script.

Setup your folder structure such as:

/my_folder/ngs
  /genomes
  /demo

Let's download the human reference genomes GRCh38:

cd /my_folder/ngs/genomes/
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.gtf.gz
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.fa.gz
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.domains.bed

cd /my_folder/ngs/demo/
/my_folder/ngs/demo/$ wget http://ngb.opensource.epam.com/distr/data/demo/ngb_demo_data.tar.gz

Decompress Reference Genomes

cd /my_folder/ngs/genomes/
/my_folder/ngs/genomes/$ gzip -d Homo_sapiens.GRCh38.fa.gz
/my_folder/ngs/genomes/$ gzip -d Homo_sapiens.GRCh38.gtf.gz

Decompress demo data

cd /my_folder/ngs/demo/
/my_folder/ngs/demo/$ tar -zxvf ngb_demo_data.tar.gz

For more information on genome sequencing, please refer to Genome Sequencing in a Nutshell

Spin up EPAM NGB Docker Instance

Now that you have built docker instance, let's start it up. Note, this docker instance allows you to use -v to mount an existing folder from the host into docker client. The instructions below allow you to spin up the docker container and to ssh into it.

# Spin up docker instance pointing the ngs_data folder
docker run -p 8080:8080 -d --name ngbcore -v /my_folder/ngs:/ngs ngb:latest

# Get container id
docker ps -a

# SSH into docker instance
docker exec -it $CONTAINER bash

Note, you will be able to open up the NGB Genome Browser at http://localhost:8080/catgenome/.

Important, while you can start the NGB Genome Browser, it cannot access the genome sequences / VCF / BAM files until you register them with ngb.

Update your Docker Instance

You will likely need to also udpate your docker instance ranging from security patches to software such as vim. Once you log in, type the following commands to do that:

apt-get update
apt-get install vim

Register your genome sequences

As noted in previous sections, while you can launch NGB Genome Browser, there are no reference datasets for it to work with. These next steps allow you to do exactly that; these steps are derived from Add reference genome to NGB server.

As noted in a previous section, the assumption is that the genome data has been uploaded to the /my_folder/ngs/genomes and the sample datasets in the /my_folder/ngs/demo/. Please adjust your scripts accordingly to your locations.

As noted in a previous section, you have already SSH'ed into your Docker instance:

# SSH into docker instance
docker exec -it $CONTAINER bash

Important to note, while the host machine has the reference genome in /my_folder/ngs/genomes/ and sample datasets in the /my_folder/ngs/demo/ folder, from the context of the Docker instance, the reference genomes are in the /ngs/genomes and sample datasets are in the /ngs/demo folder. This was configured when you had spun up your Docker instance in one of the previous steps using the -v parameter.

The next set of commands will add the GRCh38 reference genome:

# Set paths
export GRCH38_SEQ_PATH=/ngs/genomes/grch38/Homo_sapiens.GRCh38.fa
export GRCH38_GENES_PATH=/ngs/genomes/grch38/Homo_sapiens.GRCh38.gtf

# Register GRCh38 reference sequence
ngb reg_ref $GRCH38_SEQ_PATH -n GRCh38 -t

# Register GRCh38 genes annotation GTF file
ngb reg_file GRCh38 $GRCH38_GENES_PATH -n GRCh38_Genes -t

# Associate genes annotation with a reference sequence
ngb add_genes GRCh38 GRCh38_Genes

# Create a dataset mapped to a newly created genome
ngb reg_dataset GRCh38 GRCh38_Dataset GRCh38_Genes -t

Now that you have executed the above commands, you can go back to NGB Genome Browser at http://localhost:8080/catgenome/ to view the reference gene.

Register your BAM / VCF files

In these next steps, you will add a sample set of BAM and VCF file.

# Create a root - level dataset named FGFR3-TACC-Fusion-Sample, that will contain samples
# This step assumes that the GRCh38 reference genome is already registered
ngb reg_dataset GRCh38 FGFR3-TACC-Fusion-Sample

# Assuming that all files are located in /ngs/demo folder 
# and that all files are named using the same template "sampleN.ext"
export NGB_FILES_TEMPLATE=/ngs/demo/FGFR3-TACC-Fusion

# Add BAM/VCF files to samples datasets
ngb add_dataset FGFR3-TACC-Fusion-Sample ${NGB_FILES_TEMPLATE}.bam?${NGB_FILES_TEMPLATE}.bam.bai ${NGB_FILES_TEMPLATE}.vcf

Now you can open up browser to check out both the GRCh38 gene and FGFR3-TACC-Fusion-Sample at http://localhost:8080/catgenome/.

Demo Script

The demo script below is an abridged version of the Filling in the instaleld NGB with data documentation.

Review the Datasets

By clicking the FGFR3-TACC-Fusion-Sample within the Datasets dialog on the right, you are choosing the Fusion sample VCF and BAM files.

Review the Variants

Click on the top right Variants tab and you will see a duplication for TACC3, FGFR3 at position 1728519. Once you click on the duplicate, you can see the dialogs for Reference, BAM, VCF, and Gene

Reorder the Reference, Gene, VCF, and BAM panels

To make things easier, drag and drop the various dialogs so they are in order of Reference, Gene, VCF, and BAM.

Note, to get the same view as these screenshots, perform the following as well:

Under Reference, click on General (Forward Strand) so you can also include the Translation
Under BAM, click on Color mode and choose By Pair Orientation

Hover over the different dialogs

There is a lot of great information if you hover your mouse over each of the dialogs. For example, the screenshot below provides information of the duplicate.

Show Actions

In this example, if you left click on the duplicate VCF, you will get a "Show Info" and "Show pair in split screen" dialog.

By clicking on "Show pair in split screen", you can see the duplicates

And by clicking on "Show Info", you can open up the Gene Visualizer.

dennyglee/NGB-Genome-Browser-Docker-E2E-Script.md