The NGB Genome Browser is a web-based NGS data viewer with structural variations (SVs) visualization capabilities. This gist provides end-to-end Docker installation instructions and a demos script. This is an e2e version including downloading the sample VCF and BAM files.
Note, these instructions are derived from the following sources:
Note, you can use the hosted version of the NGB Genome Browser at NGB Genome Browser (Hosted). As well, you can view the visualizations in the paper Prioritisation of Structural Variant Calls in Cancer Genomes.
This section is an abridged version of the NGB Quick Start.
Pre-requisite, ensure you have already installed Docker
- Let's validate you have docker installed
# Validate you have docker installed
docker --version
- Next, clone the NGB sources
# Get NGB sources
git clone https://github.com/epam/NGB.git
cd NGB
- Build source packages and binaries (takes about 15min to finish)
# Build source packages and binaries
./gradlew buildDocker
Once installed, the images should be part of your local repository:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ngb latest 8491b3ab993e 5 days ago 548MB
ubuntu 16.04 2a4cca5ac898 10 days ago 111MB
You can use your own data, but to get yourself up and running faster you can utilize the EPAM NGB Demo Data or EPAM NGB Demo Data (TAR) and scripts. The key script that you can follow is demo_data_init.sh. Below is an abridged version of the script.
- Setup your folder structure such as:
/my_folder/ngs
/genomes
/demo
- Let's download the human reference genomes
GRCh38
:
cd /my_folder/ngs/genomes/
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.gtf.gz
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.fa.gz
/my_folder/ngs/genomes/$ wget http://ngb.opensource.epam.com/distr/data/genome/grch38/Homo_sapiens.GRCh38.domains.bed
cd /my_folder/ngs/demo/
/my_folder/ngs/demo/$ wget http://ngb.opensource.epam.com/distr/data/demo/ngb_demo_data.tar.gz
- Decompress Reference Genomes
cd /my_folder/ngs/genomes/
/my_folder/ngs/genomes/$ gzip -d Homo_sapiens.GRCh38.fa.gz
/my_folder/ngs/genomes/$ gzip -d Homo_sapiens.GRCh38.gtf.gz
- Decompress demo data
cd /my_folder/ngs/demo/
/my_folder/ngs/demo/$ tar -zxvf ngb_demo_data.tar.gz
For more information on genome sequencing, please refer to Genome Sequencing in a Nutshell
Now that you have built docker instance, let's start it up. Note, this docker instance allows you to use -v
to mount an existing folder from the host into docker client. The instructions below allow you to spin up the docker container and to ssh
into it.
# Spin up docker instance pointing the ngs_data folder
docker run -p 8080:8080 -d --name ngbcore -v /my_folder/ngs:/ngs ngb:latest
# Get container id
docker ps -a
# SSH into docker instance
docker exec -it $CONTAINER bash
Note, you will be able to open up the NGB Genome Browser at http://localhost:8080/catgenome/.
Important, while you can start the NGB Genome Browser, it cannot access the genome sequences / VCF / BAM files until you register them with ngb.
You will likely need to also udpate your docker instance ranging from security patches to software such as vim
. Once you log in, type the following commands to do that:
apt-get update
apt-get install vim
As noted in previous sections, while you can launch NGB Genome Browser, there are no reference datasets for it to work with. These next steps allow you to do exactly that; these steps are derived from Add reference genome to NGB server.
As noted in a previous section, the assumption is that the genome data has been uploaded to the
/my_folder/ngs/genomes
and the sample datasets in the/my_folder/ngs/demo/
. Please adjust your scripts accordingly to your locations.
As noted in a previous section, you have already SSH'ed into your Docker instance:
# SSH into docker instance
docker exec -it $CONTAINER bash
Important to note, while the host machine has the reference genome in
/my_folder/ngs/genomes/
and sample datasets in the/my_folder/ngs/demo/
folder, from the context of the Docker instance, the reference genomes are in the/ngs/genomes
and sample datasets are in the/ngs/demo
folder. This was configured when you had spun up your Docker instance in one of the previous steps using the-v
parameter.
The next set of commands will add the GRCh38 reference genome:
# Set paths
export GRCH38_SEQ_PATH=/ngs/genomes/grch38/Homo_sapiens.GRCh38.fa
export GRCH38_GENES_PATH=/ngs/genomes/grch38/Homo_sapiens.GRCh38.gtf
# Register GRCh38 reference sequence
ngb reg_ref $GRCH38_SEQ_PATH -n GRCh38 -t
# Register GRCh38 genes annotation GTF file
ngb reg_file GRCh38 $GRCH38_GENES_PATH -n GRCh38_Genes -t
# Associate genes annotation with a reference sequence
ngb add_genes GRCh38 GRCh38_Genes
# Create a dataset mapped to a newly created genome
ngb reg_dataset GRCh38 GRCh38_Dataset GRCh38_Genes -t
Now that you have executed the above commands, you can go back to NGB Genome Browser at http://localhost:8080/catgenome/ to view the reference gene.
In these next steps, you will add a sample set of BAM and VCF file.
# Create a root - level dataset named FGFR3-TACC-Fusion-Sample, that will contain samples
# This step assumes that the GRCh38 reference genome is already registered
ngb reg_dataset GRCh38 FGFR3-TACC-Fusion-Sample
# Assuming that all files are located in /ngs/demo folder
# and that all files are named using the same template "sampleN.ext"
export NGB_FILES_TEMPLATE=/ngs/demo/FGFR3-TACC-Fusion
# Add BAM/VCF files to samples datasets
ngb add_dataset FGFR3-TACC-Fusion-Sample ${NGB_FILES_TEMPLATE}.bam?${NGB_FILES_TEMPLATE}.bam.bai ${NGB_FILES_TEMPLATE}.vcf
Now you can open up browser to check out both the GRCh38 gene and FGFR3-TACC-Fusion-Sample at http://localhost:8080/catgenome/.
The demo script below is an abridged version of the Filling in the instaleld NGB with data documentation.
By clicking the FGFR3-TACC-Fusion-Sample
within the Datasets dialog on the right, you are choosing the Fusion sample VCF and BAM files.
Click on the top right Variants tab and you will see a duplication for TACC3, FGFR3 at position 1728519. Once you click on the duplicate, you can see the dialogs for Reference, BAM, VCF, and Gene
To make things easier, drag and drop the various dialogs so they are in order of Reference, Gene, VCF, and BAM.
Note, to get the same view as these screenshots, perform the following as well:
- Under Reference, click on
General (Forward Strand)
so you can also include theTranslation
- Under BAM, click on
Color mode
and chooseBy Pair Orientation
There is a lot of great information if you hover your mouse over each of the dialogs. For example, the screenshot below provides information of the duplicate.
In this example, if you left click on the duplicate VCF, you will get a "Show Info" and "Show pair in split screen" dialog.
By clicking on "Show pair in split screen", you can see the duplicates
And by clicking on "Show Info", you can open up the Gene Visualizer.