module load bbtools/38.82
module load bcftools/1.14
bbduk.sh threads=8 \
in1=raw/170283.mate1.fastq.gz \
in2=raw/170283.mate2.fastq.gz \
out1=fastq/170283-trimmed.mate1.fastq.gz \
out2=fastq/170283-trimmed.mate2.fastq.gz \
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# goes along with http://seqanswers.com/forums/showthread.php?p=220925#post220925 | |
# | |
# assumes you have PBJelly, blasr, tabix, bcftools, samtools installed | |
# below I am using a machine with 70 cores on a single node, adjust to the number of cores to your machine | |
# The scripts below are obviously not designed for use with a cluster, but can be modified | |
# | |
######################### | |
# STEP 1 Combine the FASTQ files and remove the originals to save space | |
######################### | |
## first combine files and delete the originals to save space |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
set -e | |
# installing fasta-splitter.pl | |
## wget http://kirill-kryukov.com/study/tools/fasta-splitter/files/fasta-splitter-0.2.6.zip | |
## unzip fasta-splitter-0.2.6.zip | |
# assumes initial genome to be error-corrected by pilon is called | |
## genome.pilon-0.fasta |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was with https://zymo-files.s3.amazonaws.com/BioPool/ZymoBIOMICS.STD.refseq.v2.zip | |
RAW_SUP_Duplex pg_asm_1x_corrected_SUP_duplex pg_asm_2x_corrected_SUP_duplex pg_asm_3x_corrected_SUP_duplex | |
Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis | |
# target bases: 4041255 # target bases: 4041255 # target bases: 4041255 # target bases: 4041255 | |
# target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) | |
1159311 reference bases covered by exactly one contig 3791080 reference bases covered by exactly one contig 3642732 reference bases covered by exa |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# output from best commit #fcdfa97 (https://github.com/google/best), .summary_identity_stats.csv files using reads | |
# aligned to concatenated chr20_MATERNAL and chr20_PATERNAL from hg002v1.0.1.fasta.gz (https://github.com/marbl/HG002) (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz) | |
# using mm2-fast commit # 10bde16 using settings: --eqx --secondary=no -Y -c -ax map-ont -k 19 -w 13 -t 48 | |
# or using these settings for Illumina NextSeq2000 reads: -t 48 --eqx --secondary=no -acx sr | |
# | |
# brutal_rewrite (br) commit # ad87f92 (https://github.com/natir/br) using settings: -k 19 -m graph | |
# kmer read filter (kmrf) commit # 36cad24 (https://github.com/natir/kmrf) using setting: -k 17 | |
# peregrine-2021 (pg_asm) commit # 6698eb1 (https://github.com/cschin/peregrine-2021): using default settings | |
# | |
# herro (herro) commit # c41dc30 (https://github.com/lbcb-sci/herro) using defaults and model at time of commit |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# previous versions allowed to generate directly from BAM, but there seemed to have been some troubles | |
# between versions of XAM, so this version is working so far for its intended purpose on https://github.com/brendanofallon/jovian | |
# also, this version is rather fast | |
# | |
# | |
# note: ignores quality scores at the moment - fixed in revision#4 | |
# note2: outputs to STDOUT a SAM file without a header | |
# tested with julialang v1.10.2 and XAM v0.4.0 | |
# $ julia shredBAM.jl input.bam 300 > test.sam.noheader | |
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# tested on Julialang v.1.10.2 , DataStructures v0.18.16, and FASTX v2.1.4 | |
# Usage | |
# $ julia stitch-fastq.jl chr20.herro.fasta.Q30.recal.shred.fastq > chr20.herro.fasta.Q30.recal.fastq | |
# | |
import Pkg; Pkg.add("FASTX") | |
import Pkg; Pkg.add("DataStructures") | |
using DataStructures | |
using FASTX | |
function process_fastq_file(filename::String) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# tested on Julialang v.1.10.2, DataStructures v0.18.16, and FASTX v2.1.4 | |
# Usage | |
# $ julia stitch-fasta.jl chr20.herro.fasta.Q30.recal.shred.fasta > chr20.herro.fasta.Q30.recal.fasta | |
# | |
import Pkg; Pkg.add("FASTX") | |
import Pkg; Pkg.add("DataStructures") | |
using DataStructures | |
using FASTX | |
function process_fasta_file(filename::String) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using XAM | |
using ArgParse | |
using DataFrames | |
using Dates | |
using Plots | |
using StatsBase | |
using Plots.PlotMeasures | |
function parse_commandline() |
OlderNewer