This is bash script is based on the Impute-First github and the preprint.
Inputs:
- Reference genome
- HGSVC2 Reference panel
- PLINK genetic mac
- Novaseq HG002 sequencing reads
This is bash script is based on the Impute-First github and the preprint.
Inputs:
| #!/usr/bin/env python3 | |
| """Coverage mean and standard deviation of autosomes | |
| Estimate the mean and standard deviation from a mosdepth coverage BED for | |
| positions with coverage in the range (0, 2 * non-zero mode). This estimate | |
| behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and | |
| GRCh38, and may be useful for other situations as well. | |
| $ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}" | |
| $ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py | 
| #!/usr/bin/env python | |
| ## calculate N50 from fasta file | |
| ## N50 = contig length such that half of the contigs are longer and 1/2 of contigs are shorter | |
| import commands | |
| import sys | |
| import os | |
| from itertools import groupby | |
| import numpy | 
| #! /usr/bin/env python | |
| """ | |
| extract_reads.py | |
| Created by Tim Stuart | |
| """ | |
| import pysam | |
| #!/bin/bash | |
| # Usage: deinterleave_fastq.sh < interleaved.fastq f.fastq r.fastq [compress] | |
| # | |
| # Deinterleaves a FASTQ file of paired reads into two FASTQ | |
| # files specified on the command line. Optionally GZip compresses the output | |
| # FASTQ files using pigz if the 3rd command line argument is the word "compress" | |
| # | |
| # Can deinterleave 100 million paired reads (200 million total | |
| # reads; a 43Gbyte file), in memory (/dev/shm), in 4m15s (255s) | |
| # |