This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Revisions 1 & 2 use the same read name read1 and read2, this | |
# revision 3 appends _1 and _2 to read1 and read2, respectively. | |
# | |
# Basic idea is to take a long read alignment file that, then shred the long reads into shreds, keeping their alignment information | |
# output hi-c-like read-pairs, interleaved in a BAM file! | |
# | |
# So for example, | |
# ----------------------- 1long read | |
# 1r1 2r1 3r1 3r2 2r2 1r2 | |
# results in 1read1, 1read2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# revision 2, now supports multi-core processing | |
# Basic idea is to take a long read alignment file that has been shredded with shredBAM.jl, then | |
# output hi-c-like read-pairs | |
# | |
# So for example, | |
# ----------------------- 1long read | |
# 1r1 2r1 3r1 3r2 2r2 1r2 | |
# results in 1read1, 1read2 | |
# 2read1, 2read2 | |
# 3read1, 3read2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Get HG002 | |
# wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz | |
# gunzip hg002v1.0.1.fasta.gz | |
# | |
# Use seqtk to get maternal and paternal sequences | |
# https://github.com/lh3/seqtk | |
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "MATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.maternal.fasta & | |
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "PATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.paternal.fasta & | |
# | |
# use samtools to index haplotypes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# new in revision 2, multithreaded support - up to twice as fast as single-threaded, revision 1 | |
# invoke with: | |
# julia --threads 16 Plot_Animated_Nanopore_Quality_x_Length_2D_Contours.jl --input_file test.bam --fps 0.7 --output_file test.gif | |
# but need a lot of threads (up to 16, above 16 threads, get saturation in cores vs. time plot) | |
using XAM | |
using ArgParse | |
using DataFrames |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# note: the Dorado BAM file produced by basecalling POD5 files needs to be indexed with "samtools index input_file" | |
# note2: something seems really strange about the qualtiy scores, I have checked several times, but they seem correct | |
# I am not sure why they are reporting that for example reads with Q38 average quality scores. | |
# EDIT: this is fixed, see https://www.biostars.org/p/295932/#295936 for better formula for average Phred Score | |
# tested with julia-1.10.3 and XAM.jl-0.40 | |
using XAM | |
using ArgParse | |
using DataFrames | |
using Dates | |
using Plots |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# note: outputs to STDOUT a SAM file without a header | |
# note2: remove auxillary tags and information | |
# note3: ignores RNEXT and PNEXT from BAM file (puts * and 0 respectivelv) | |
# tested with julialang v1.10.2 and XAM v0.4.0 | |
# $ julia changeBAMquality.jl input.bam ? > test.sam.noheader | |
# | |
# to add a header from the original BAM file that had its qualities changed | |
# and add on a PG line | |
# $ ASCII_CHARACTER="?" | |
# $ INPUT=input.bam |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using XAM | |
using ArgParse | |
using DataFrames | |
using Dates | |
using Plots | |
using StatsBase | |
using Plots.PlotMeasures | |
function parse_commandline() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# tested on Julialang v.1.10.2, DataStructures v0.18.16, and FASTX v2.1.4 | |
# Usage | |
# $ julia stitch-fasta.jl chr20.herro.fasta.Q30.recal.shred.fasta > chr20.herro.fasta.Q30.recal.fasta | |
# | |
import Pkg; Pkg.add("FASTX") | |
import Pkg; Pkg.add("DataStructures") | |
using DataStructures | |
using FASTX | |
function process_fasta_file(filename::String) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# tested on Julialang v.1.10.2 , DataStructures v0.18.16, and FASTX v2.1.4 | |
# Usage | |
# $ julia stitch-fastq.jl chr20.herro.fasta.Q30.recal.shred.fastq > chr20.herro.fasta.Q30.recal.fastq | |
# | |
import Pkg; Pkg.add("FASTX") | |
import Pkg; Pkg.add("DataStructures") | |
using DataStructures | |
using FASTX | |
function process_fastq_file(filename::String) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# previous versions allowed to generate directly from BAM, but there seemed to have been some troubles | |
# between versions of XAM, so this version is working so far for its intended purpose on https://github.com/brendanofallon/jovian | |
# also, this version is rather fast | |
# | |
# | |
# note: ignores quality scores at the moment - fixed in revision#4 | |
# note2: outputs to STDOUT a SAM file without a header | |
# tested with julialang v1.10.2 and XAM v0.4.0 | |
# $ julia shredBAM.jl input.bam 300 > test.sam.noheader | |
# |
NewerOlder