Skip to content

Instantly share code, notes, and snippets.

View jelber2's full-sized avatar

Jean Elbers jelber2

View GitHub Profile
@jelber2
jelber2 / Running-PBJelly-example-with-indel-correction-with-BBMap-no-Pilon.txt
Last active November 9, 2018 08:34
Running-PBJelly-example-with-indel-correction-with-BBMap-no-Pilon
# goes along with http://seqanswers.com/forums/showthread.php?p=220925#post220925
#
# assumes you have PBJelly, blasr, tabix, bcftools, samtools installed
# below I am using a machine with 70 cores on a single node, adjust to the number of cores to your machine
# The scripts below are obviously not designed for use with a cluster, but can be modified
#
#########################
# STEP 1 Combine the FASTQ files and remove the originals to save space
#########################
## first combine files and delete the originals to save space
@jelber2
jelber2 / pilon-runs-1-2.sh
Last active November 9, 2018 08:31
Runs Pilon twice on a file called genome.pilon-0.fasta
#! /bin/bash
set -e
# installing fasta-splitter.pl
## wget http://kirill-kryukov.com/study/tools/fasta-splitter/files/fasta-splitter-0.2.6.zip
## unzip fasta-splitter-0.2.6.zip
# assumes initial genome to be error-corrected by pilon is called
## genome.pilon-0.fasta

Adapter trimming with BBDuk 38.82

module load bbtools/38.82
module load bcftools/1.14
bbduk.sh threads=8 \
in1=raw/170283.mate1.fastq.gz \
in2=raw/170283.mate2.fastq.gz \
out1=fastq/170283-trimmed.mate1.fastq.gz \
out2=fastq/170283-trimmed.mate2.fastq.gz \
@jelber2
jelber2 / gist:451eec8c6b74617b8bf0532905f256c1
Last active October 25, 2023 09:51
Zymo_Mock_HMW_SUP_duplex_reads_with_peregrine_2021
This was with https://zymo-files.s3.amazonaws.com/BioPool/ZymoBIOMICS.STD.refseq.v2.zip
RAW_SUP_Duplex pg_asm_1x_corrected_SUP_duplex pg_asm_2x_corrected_SUP_duplex pg_asm_3x_corrected_SUP_duplex
Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis
# target bases: 4041255 # target bases: 4041255 # target bases: 4041255 # target bases: 4041255
# target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%)
1159311 reference bases covered by exactly one contig 3791080 reference bases covered by exactly one contig 3642732 reference bases covered by exa
@jelber2
jelber2 / README.md
Last active October 18, 2023 13:41
MethPhaser

MethPhaser installation

Install micromamba or mamba or conda

# Install micromamba
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

You will then see something like this in a BASH shell (parts with "(type....)" are added for instructions

@jelber2
jelber2 / README.txt
Last active February 29, 2024 10:25
Error_rates_in_Herro_and_corrected_Herro_reads_compared_to_NextSeq2000
# output from best commit #fcdfa97 (https://github.com/google/best), .summary_identity_stats.csv files using reads
# aligned to concatenated chr20_MATERNAL and chr20_PATERNAL from hg002v1.0.1.fasta.gz (https://github.com/marbl/HG002) (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz)
# using mm2-fast commit # 10bde16 using settings: --eqx --secondary=no -Y -c -ax map-ont -k 19 -w 13 -t 48
# or using these settings for Illumina NextSeq2000 reads: -t 48 --eqx --secondary=no -acx sr
#
# brutal_rewrite (br) commit # ad87f92 (https://github.com/natir/br) using settings: -k 19 -m graph
# kmer read filter (kmrf) commit # 36cad24 (https://github.com/natir/kmrf) using setting: -k 17
# peregrine-2021 (pg_asm) commit # 6698eb1 (https://github.com/cschin/peregrine-2021): using default settings
#
# herro (herro) commit # c41dc30 (https://github.com/lbcb-sci/herro) using defaults and model at time of commit
@jelber2
jelber2 / shredBAM.jl
Last active March 11, 2024 13:09
Shred alignments in a BAM file (output is a SAM file without header) to make long read alignments into short-read-like
# previous versions allowed to generate directly from BAM, but there seemed to have been some troubles
# between versions of XAM, so this version is working so far for its intended purpose on https://github.com/brendanofallon/jovian
# also, this version is rather fast
#
#
# note: ignores quality scores at the moment - fixed in revision#4
# note2: outputs to STDOUT a SAM file without a header
# tested with julialang v1.10.2 and XAM v0.4.0
# $ julia shredBAM.jl input.bam 300 > test.sam.noheader
#
@jelber2
jelber2 / stitch-fastq.jl
Last active March 7, 2024 09:45
Stitch FASTQ reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2 , DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fastq.jl chr20.herro.fasta.Q30.recal.shred.fastq > chr20.herro.fasta.Q30.recal.fastq
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fastq_file(filename::String)
@jelber2
jelber2 / stitch-fasta.jl
Last active March 7, 2024 09:45
Stitch FASTA reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2, DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fasta.jl chr20.herro.fasta.Q30.recal.shred.fasta > chr20.herro.fasta.Q30.recal.fasta
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fasta_file(filename::String)
@jelber2
jelber2 / PlotSequenceTime.jl
Last active March 26, 2024 10:04
Plot changes in read length distribution of Nanopore Dorado called bases
using XAM
using ArgParse
using DataFrames
using Dates
using Plots
using StatsBase
using Plots.PlotMeasures
function parse_commandline()