Skip to content

Instantly share code, notes, and snippets.

View ckandoth's full-sized avatar

Cyriac Kandoth ckandoth

View GitHub Profile
@ckandoth
ckandoth / ngs_test_data.sh
Last active February 17, 2025 09:10
Create test data for CI/CD of a FASTQ to gVCF bioinformatics pipeline
# GOAL: Create test data for CI/CD of a germline variant calling pipeline (FASTQ to gVCF)
# Steps below were performed on Ubuntu 24.04, but should be reproducible on any Linux distro
# Download and install micromamba under your home directory and logout:
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
# Log back in to add micromamba to your $PATH, and then use it to install the tools we need:
micromamba create -y -n bio -c conda-forge -c bioconda htslib==1.21 samtools==1.21 bcftools==1.21 picard-slim==3.3.0
micromamba activate bio
@ckandoth
ckandoth / base_image_benchmark.md
Last active January 24, 2022 16:47
Compare the speed of containerized bwa-mem using various base images

On a Linux VM or Workstation with docker installed, fetch the GRCh38 FASTA, its index, and a pair of FASTQs:

wget -P /hot/ref https://storage.googleapis.com/genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa{,.fai}
wget -P /hot/reads/test https://storage.googleapis.com/data.cyri.ac/test_L001_R{1,2}_001.fastq.gz

If on a Slurm cluster, here is an example of wrapping a docker run command in an sbatch request:

sbatch --chdir=/hot --output=ref/std.out --error=ref/std.err --nodes=1 --ntasks-per-node=1 --cpus-per-task=8 --mem=30G --time=4:00:00 --wrap="docker run --help"
@ckandoth
ckandoth / single_machine_slurm_on_ubuntu.md
Last active May 19, 2024 07:49
Install Slurm 19.05 on a standalone machine running Ubuntu 20.04

Use apt to install the necessary packages:

sudo apt install -y slurm-wlm slurm-wlm-doc

Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:

  1. Set your machine's hostname in SlurmctldHost and NodeName.
  2. Set CPUs as appropriate, and optionally Sockets, CoresPerSocket, and ThreadsPerCore. Use command lscpu to find what you have.
  3. Set RealMemory to the number of megabytes you want to allocate to Slurm jobs,
  4. Set StateSaveLocation to /var/spool/slurm-llnl.
  5. Set ProctrackType to linuxproc because processes are less likely to escape Slurm control on a single machine config.
@ckandoth
ckandoth / ensembl_vep_102_with_offline_cache.md
Last active February 8, 2025 04:16
Install Ensembl's VEP v102 with local cache for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use conda to install VEP and its dependencies. If you don't already have conda, install it into $HOME/miniconda3 as follows:

curl -sL https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh -o /tmp/miniconda.sh
sh /tmp/miniconda.sh -bfp $HOME/miniconda3

Add the conda bin folder into your $PATH so that all installed tools are accessible via command-line. You can also add this to your ~/.bashrc

@ckandoth
ckandoth / ensembl_vep_95_with_offline_cache.md
Last active October 4, 2022 21:49
Install Ensembl's VEP v95 with various caches for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

To follow these instructions, we'll assume you have these packaged essentials installed:

## For Debian/Ubuntu system admins ##
sudo apt-get install -y build-essential git libncurses-dev

## For RHEL/CentOS system admins ##
sudo yum groupinstall -y 'Development Tools'
sudo yum install -y git ncurses-devel
@ckandoth
ckandoth / build_perl_5_24.txt
Created May 15, 2017 17:33
Download, install, and build Perl 5.24 in a local folder
# To follow these instructions, we'll assume your system admins have already installed these essentials:
# Debian/Ubuntu system admins should run:
sudo apt-get install -y build-essential libexpat1-dev libssl-dev libmysqlclient-dev libxml2-dev
# RHEL/CentOS system admins should run:
sudo yum groupinstall -y 'Development Tools'
sudo yum install -y expat-devel openssl-devel mysql-devel libxml2-devel
# Create a folder where you want to install different Perls, and cd into it:
# Note that it doesn't need to be your home folder. Put it wherever you want to maintain such software:
export PERL_BASE="$HOME/perl"
@ckandoth
ckandoth / ensembl_vep_88_with_offline_cache.md
Last active May 15, 2018 16:17
Install Ensembl's VEP v88 with various caches for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

To follow these instructions, we'll assume you have these packaged essentials installed:

## For Debian/Ubuntu system admins ##
sudo apt-get install -y build-essential git libncurses-dev

## For RHEL/CentOS system admins ##
sudo yum groupinstall -y 'Development Tools'
sudo yum install -y git ncurses-devel
@ckandoth
ckandoth / build_python_2_7.txt
Created November 29, 2016 22:12
Download, install, and build Python 2.7.10 in a local folder
# Create a folder where you want to install different Pythons, and cd into it:
# Note that it doesn't need to be your home folder. Put it wherever you want to maintain such software:
export PYTHON_BASE="$HOME/python"
mkdir -p $PYTHON_BASE
cd $PYTHON_BASE
# Download source tarball into a subfolder named src, and untar:
curl --create-dirs -L -o src/Python-2.7.10.tgz https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
cd src
tar -zxf Python-2.7.10.tgz
@ckandoth
ckandoth / ensembl_vep_86_with_offline_cache.md
Last active July 15, 2021 16:26
Install Ensembl's VEP v86 with various caches for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

To follow these instructions, we'll assume you have these packaged essentials installed:

## For Debian/Ubuntu system admins ##
sudo apt-get install -y build-essential git libncurses-dev

## For RHEL/CentOS system admins ##
sudo yum groupinstall -y 'Development Tools'
sudo yum install -y git ncurses-devel
@ckandoth
ckandoth / ensembl_vep_85_with_offline_cache.md
Last active May 15, 2018 08:50
Install Ensembl's VEP v85 with various caches for running offline

Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

To follow these instructions, we'll assume you have these packaged essentials installed:

## For Debian/Ubuntu system admins ##
sudo apt-get install -y build-essential git libncurses-dev

## For RHEL/CentOS system admins ##
sudo yum groupinstall -y 'Development Tools'

sudo yum install -y git ncurses-devel