configuration, config/config.yaml
:
num_samples: 60
max_genotypes: 2
genome_path: genomes
(see workflow/schema/config.schema.json for details)
genome path genomes/
:
- splitting fasta files
while read l; do if [[ $l =~ ^\>([^[:space:]]+) ]]; then F="genomes/${BASH_REMATCH[1]}.fasta"; rm -f "${F}"; echo $F; fi; echo "$l" >> "${F}"; done < bacteria.fasta
- results:
> ls -l genomes/ total 6156 -rw-r--r-- 64 dryak users 2344128 Nov 2 18:04 NZ_CP031133.1.fasta -rw-r--r-- 81 dryak users 3913384 Nov 2 18:04 NZ_CP102358.1.fasta -rw-r--r-- 58 dryak users 698 Nov 2 18:04 NZ_JABAHH010000080.1.fasta -rw-r--r-- 55 dryak users 33207 Nov 2 18:04 NZ_KI391983.1.fasta
run the snakemake, (assuming we're in a working/
subdirectory next to git clone):
snakemake --snakefile ../workflow/Snakemake --configfile config/config.yaml --use-conda --cores 1 all_mess
(dependencies are automatically handled by snakemake)
per-sample configuration file, mess_config.yml
:
input_table_path: mess_sample1.tsv
sd_read_num: 0
sd_rep: 0
replicates: 1
community_name: sample1
seq_tech: illumina
read_status: paired
illumina_sequencing_system: HS20
illumina_read_len: 100
illumina_mean_frag_len: 200
illumina_sd_frag_len: 20
set_seed: 20
NCBI_key: your_ncbi_key
NCBI_email: your_ncbi_email
complete_assemblies: True
reference_assemblies: False
representative_assemblies: False
exclude_from_metagenomes: True
Genbank_assemblies: True
Refseq_assemblies: True
Rank_to_filter_by: False
seed: 1
bam: False
install the dependencyes and runnin the snakemake (assuming that there's a git clone in MeSS. One could alternatively use the mess run
wrapper):
# install as per README
mamba create -n mess mess
mamba activate mess
# missing dependency, as per MeSS' messenv.yml
mamba install art
mamba install seqkit
mamba install biopython
snakemake --snakefile ../../MeSS/mess/scripts/Snakefile --configfile mess_config.yml --use-conda --resources ncbi_requests=3 nb_simulation=2 parallel_cat=2 --cores 4 all_sim
Results will be in `simreads/samples1-1_R1.fq.gz' etc.