Jean Elbers jelber2

MethPhaser installation

Install micromamba or mamba or conda

# Install micromamba
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

You will then see something like this in a BASH shell (parts with "(type....)" are added for instructions

Adapter trimming with BBDuk 38.82

module load bbtools/38.82
module load bcftools/1.14
bbduk.sh threads=8 \
in1=raw/170283.mate1.fastq.gz \
in2=raw/170283.mate2.fastq.gz \
out1=fastq/170283-trimmed.mate1.fastq.gz \
out2=fastq/170283-trimmed.mate2.fastq.gz \

	# output from best commit #fcdfa97 (https://github.com/google/best), .summary_identity_stats.csv files using reads
	# aligned to concatenated chr20_MATERNAL and chr20_PATERNAL from hg002v1.0.1.fasta.gz (https://github.com/marbl/HG002) (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz)
	# using mm2-fast commit # 10bde16 using settings: --eqx --secondary=no -Y -c -ax map-ont -k 19 -w 13 -t 48
	# or using these settings for Illumina NextSeq2000 reads: -t 48 --eqx --secondary=no -acx sr
	#
	# brutal_rewrite (br) commit # ad87f92 (https://github.com/natir/br) using settings: -k 19 -m graph
	# kmer read filter (kmrf) commit # 36cad24 (https://github.com/natir/kmrf) using setting: -k 17
	# peregrine-2021 (pg_asm) commit # 6698eb1 (https://github.com/cschin/peregrine-2021): using default settings
	#
	# herro (herro) commit # c41dc30 (https://github.com/lbcb-sci/herro) using defaults and model at time of commit

	This was with https://zymo-files.s3.amazonaws.com/BioPool/ZymoBIOMICS.STD.refseq.v2.zip

	RAW_SUP_Duplex pg_asm_1x_corrected_SUP_duplex pg_asm_2x_corrected_SUP_duplex pg_asm_3x_corrected_SUP_duplex
	Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis Bacillus_subtilis
	# target bases: 4041255 # target bases: 4041255 # target bases: 4041255 # target bases: 4041255
	# target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%) # target bases overlapping regions: 4041255 (100.00%)
	1159311 reference bases covered by exactly one contig 3791080 reference bases covered by exactly one contig 3642732 reference bases covered by exa

	#! /bin/bash

	set -e

	# installing fasta-splitter.pl
	## wget http://kirill-kryukov.com/study/tools/fasta-splitter/files/fasta-splitter-0.2.6.zip
	## unzip fasta-splitter-0.2.6.zip

	# assumes initial genome to be error-corrected by pilon is called
	## genome.pilon-0.fasta

	# goes along with http://seqanswers.com/forums/showthread.php?p=220925#post220925
	#
	# assumes you have PBJelly, blasr, tabix, bcftools, samtools installed
	# below I am using a machine with 70 cores on a single node, adjust to the number of cores to your machine
	# The scripts below are obviously not designed for use with a cluster, but can be modified
	#
	#########################
	# STEP 1 Combine the FASTQ files and remove the originals to save space
	#########################
	## first combine files and delete the originals to save space