This is bash script is based on the Impute-First github and the preprint.
Inputs:
- Reference genome
- HGSVC2 Reference panel
- PLINK genetic mac
- Novaseq HG002 sequencing reads
#!/bin/bash | |
# Usage: deinterleave_fastq.sh < interleaved.fastq f.fastq r.fastq [compress] | |
# | |
# Deinterleaves a FASTQ file of paired reads into two FASTQ | |
# files specified on the command line. Optionally GZip compresses the output | |
# FASTQ files using pigz if the 3rd command line argument is the word "compress" | |
# | |
# Can deinterleave 100 million paired reads (200 million total | |
# reads; a 43Gbyte file), in memory (/dev/shm), in 4m15s (255s) | |
# |
#! /usr/bin/env python | |
""" | |
extract_reads.py | |
Created by Tim Stuart | |
""" | |
import pysam | |
#!/usr/bin/env python | |
## calculate N50 from fasta file | |
## N50 = contig length such that half of the contigs are longer and 1/2 of contigs are shorter | |
import commands | |
import sys | |
import os | |
from itertools import groupby | |
import numpy |
#!/usr/bin/env python3 | |
"""Coverage mean and standard deviation of autosomes | |
Estimate the mean and standard deviation from a mosdepth coverage BED for | |
positions with coverage in the range (0, 2 * non-zero mode). This estimate | |
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and | |
GRCh38, and may be useful for other situations as well. | |
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}" | |
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py |
This is bash script is based on the Impute-First github and the preprint.
Inputs: