Skip to content

Instantly share code, notes, and snippets.

@sinamajidian
Forked from williamrowell/depth_mean_stddev.py
Created October 10, 2020 07:27
Show Gist options
  • Save sinamajidian/2db6fd56124a041d54028804a5218e9c to your computer and use it in GitHub Desktop.
Save sinamajidian/2db6fd56124a041d54028804a5218e9c to your computer and use it in GitHub Desktop.
Calculate the mean and standard deviation of a coverage depth distribution as if it were normally distributed.
#!/usr/bin/env python3
"""Coverage mean and standard deviation of autosomes
Estimate the mean and standard deviation from a mosdepth coverage BED for
positions with coverage in the range (0, 2 * non-zero mode). This estimate
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and
GRCh38, and may be useful for other situations as well.
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}"
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py
"""
__author__ = "William Rowell"
__version__ = "0.2.0"
__license__ = "MIT"
import sys
import numpy as np
from scipy.stats import norm, mode
def nonzero_mode(values):
"""Return mode of values > 0"""
return mode(values[values > 0])[0]
def mu_sigma(values):
"""Return mu, sigma of values from 1 to 2*mode"""
return norm.fit(values[(values > 0) & (values < (2*nonzero_mode(values)))])
def main():
"""For a mosdepth bed file provided to stdin, determine the mean and stddev as
if the data were normal, and report these values.
Usage: tabix ${BAM%.*}.median.regions.bed.gz `seq 1 22` | python depth_mean_stddev.py
"""
# pull coverage depth into an array
values = np.loadtxt(sys.stdin, delimiter='\t', usecols=3, dtype=np.float64)
print(mu_sigma(values))
if __name__ == "__main__":
"""$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py
> (mean, stddev)
"""
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment