Skip to content

Instantly share code, notes, and snippets.

View jelber2's full-sized avatar

Jean Elbers jelber2

View GitHub Profile
@jelber2
jelber2 / shredBAM-pairs.jl
Last active August 22, 2024 13:22
shreds a BAM file into non-overlapping pairs keeping the alignment positions of each shred
# Revisions 1 & 2 use the same read name read1 and read2, this
# revision 3 appends _1 and _2 to read1 and read2, respectively.
#
# Basic idea is to take a long read alignment file that, then shred the long reads into shreds, keeping their alignment information
# output hi-c-like read-pairs, interleaved in a BAM file!
#
# So for example,
# ----------------------- 1long read
# 1r1 2r1 3r1 3r2 2r2 1r2
# results in 1read1, 1read2
@jelber2
jelber2 / long-reads-shredded-2-hi-c-pairs.jl
Last active July 15, 2024 09:46
Takes reads processed by shredBAM.jl and then makes hi-c pairs
# revision 2, now supports multi-core processing
# Basic idea is to take a long read alignment file that has been shredded with shredBAM.jl, then
# output hi-c-like read-pairs
#
# So for example,
# ----------------------- 1long read
# 1r1 2r1 3r1 3r2 2r2 1r2
# results in 1read1, 1read2
# 2read1, 2read2
# 3read1, 3read2
@jelber2
jelber2 / Estimating_effect_of_error-correction_on_read_phase_switches_via_HG002_hapmers.jl
Last active May 28, 2024 10:24
Using Nanopore Reads from HG002, estimate the phase switching with hapmers and readmers
# Get HG002
# wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz
# gunzip hg002v1.0.1.fasta.gz
#
# Use seqtk to get maternal and paternal sequences
# https://github.com/lh3/seqtk
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "MATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.maternal.fasta &
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "PATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.paternal.fasta &
#
# use samtools to index haplotypes
@jelber2
jelber2 / Plot_Animated_Nanopore_Quality_x_Length_2D_Contours.jl
Last active July 3, 2024 09:16
Makes an animated plot of read quality and read length density contours over time for Nanopore Dorado basecalled, indexed BAM files
# new in revision 2, multithreaded support - up to twice as fast as single-threaded, revision 1
# invoke with:
# julia --threads 16 Plot_Animated_Nanopore_Quality_x_Length_2D_Contours.jl --input_file test.bam --fps 0.7 --output_file test.gif
# but need a lot of threads (up to 16, above 16 threads, get saturation in cores vs. time plot)
using XAM
using ArgParse
using DataFrames
@jelber2
jelber2 / PlotQualityHistogram.jl
Last active May 8, 2024 12:42
Plots Quality Score Density Distribution per Hour of a Sequencing Run
# note: the Dorado BAM file produced by basecalling POD5 files needs to be indexed with "samtools index input_file"
# note2: something seems really strange about the qualtiy scores, I have checked several times, but they seem correct
# I am not sure why they are reporting that for example reads with Q38 average quality scores.
# EDIT: this is fixed, see https://www.biostars.org/p/295932/#295936 for better formula for average Phred Score
# tested with julia-1.10.3 and XAM.jl-0.40
using XAM
using ArgParse
using DataFrames
using Dates
using Plots
@jelber2
jelber2 / changeBAMquality.jl
Created April 30, 2024 08:56
Converts quality scores in BAM alignments to a single ASCII Phred Score you desire
# note: outputs to STDOUT a SAM file without a header
# note2: remove auxillary tags and information
# note3: ignores RNEXT and PNEXT from BAM file (puts * and 0 respectivelv)
# tested with julialang v1.10.2 and XAM v0.4.0
# $ julia changeBAMquality.jl input.bam ? > test.sam.noheader
#
# to add a header from the original BAM file that had its qualities changed
# and add on a PG line
# $ ASCII_CHARACTER="?"
# $ INPUT=input.bam
@jelber2
jelber2 / PlotSequenceTime.jl
Last active March 26, 2024 10:04
Plot changes in read length distribution of Nanopore Dorado called bases
using XAM
using ArgParse
using DataFrames
using Dates
using Plots
using StatsBase
using Plots.PlotMeasures
function parse_commandline()
@jelber2
jelber2 / stitch-fasta.jl
Last active March 7, 2024 09:45
Stitch FASTA reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2, DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fasta.jl chr20.herro.fasta.Q30.recal.shred.fasta > chr20.herro.fasta.Q30.recal.fasta
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fasta_file(filename::String)
@jelber2
jelber2 / stitch-fastq.jl
Last active March 7, 2024 09:45
Stitch FASTQ reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2 , DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fastq.jl chr20.herro.fasta.Q30.recal.shred.fastq > chr20.herro.fasta.Q30.recal.fastq
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fastq_file(filename::String)
@jelber2
jelber2 / shredBAM.jl
Last active March 11, 2024 13:09
Shred alignments in a BAM file (output is a SAM file without header) to make long read alignments into short-read-like
# previous versions allowed to generate directly from BAM, but there seemed to have been some troubles
# between versions of XAM, so this version is working so far for its intended purpose on https://github.com/brendanofallon/jovian
# also, this version is rather fast
#
#
# note: ignores quality scores at the moment - fixed in revision#4
# note2: outputs to STDOUT a SAM file without a header
# tested with julialang v1.10.2 and XAM v0.4.0
# $ julia shredBAM.jl input.bam 300 > test.sam.noheader
#