walterst’s gists

walterst / filter_fastqV2.py

Last active August 8, 2017 09:26

Filter a fastq file to match target fastq labels, e.g. after stitching reads.

	#!/usr/bin/env python

	# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
	# index fastq to the pear assembled subset fastq
	# Usage: python filter_fastq.py input_fastq target_fastq output_fastq

	from sys import argv

	from cogent.parse.fastq import MinimalFastqParser
	from qiime.util import gzip_open

walterst / filter_fastq.py

Last active November 2, 2017 11:16

Filters an input fastq to match labels in target fastq file.

	#!/usr/bin/env python

	# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
	# index fastq to the pear assembled subset fastq
	# Usage: python filter_fastq.py input_fastq target_fastq output_fastq

	from sys import argv

	from cogent.parse.fastq import MinimalFastqParser

walterst / add_taxa_to_fasta.py

Created January 27, 2017 19:23

Use to append a tab-delimited fasta string to a fasta file

	#!/usr/bin/env python

	""" Usage:
	python add_taxa_to_fasta.py input_taxa_file input_fasta_file output_fasta
	"""

	from sys import argv

	from cogent.parse.fasta import MinimalFastaParser

walterst / collapse_rare_taxa.py

Last active December 8, 2016 12:24

Usage: python collapse_rare_tax.py -i otu_table

	#!/usr/bin/env python

	__author__ = "William Walters"
	__copyright__ = "NA"
	__credits__ = ["William Walters"]
	__license__ = "GPL"
	__version__ = "1.0"
	__maintainer__ = "William Walters"
	__email__ = "[email protected]"

walterst / filter_otu_mapping_from_otu_table.py

Last active March 2, 2017 06:01

(written with QIIME 1.9.1 dependencies in place) Finds the OTU IDs in a supplied OTU table, filters all IDs not matching these in the supplied OTU mapping file to create a filtered OTU mapping file as output. The purpose of this would be to backtrack to unclustered read data but have all reads removed that were filtered along the way.

	#!/usr/bin/env python

	__author__ = "William Walters"
	__copyright__ = "Copyright 2011"
	__credits__ = ["William Walters"]
	__license__ = "GPL"
	__version__ = "1.0"
	__maintainer__ = "William Walters"
	__email__ = "[email protected]"

walterst / strip_primers_exclude.py

Created June 22, 2016 05:38

Searches for forward/reverse primers in supplied QIIME formatted mapping file for target fasta, truncates read inside of primer hit sites, does not write read if primers are not found.

	#!/usr/bin/env python

	# USAGE: python strip_primers_exclude.py Mapping_file input_fasta output_fasta log_filename

	from sys import argv
	from string import upper
	from re import compile

	from cogent.parse.fasta import MinimalFastaParser
	from skbio.sequence import DNA

walterst / workflow_genus_distances.txt

Last active June 10, 2016 18:15

Description of process and scripts used to count nucleotide differences within target genera

	We want to ask the question of how different sequences are within certain genera. In this case, I was looking at Prevotella,
	Bacteroides, and Porphyromonas genera within Bacteroidetes, and the distance between sequences are a count of nucleotide differences
	divided by the length of the sequence considered.

	To do this, I used the 99% OTUs (16S only) from the SILVA 123 release, available here:
	http://www.arb-silva.de/no_cache/download/archive/qiime/

	We want to minimize the number of sequences included that may erroneously be labeled as the target taxa, but fall on other parts of
	the Bacteroidetes tree with other taxa, rather than grouped with the target genus. My goal is to find a node within a Bacteroidetes
	tree whose descendents are all or mostly the target genus while retaining the most possible tips that contain the

walterst / remove_short_reads.py

Created June 10, 2016 17:02

Specify an input fasta file and minimum length, e.g. python remove_short_reads.py seqs.fna 1300 > trimmed_reads.fna

	#!/usr/bin/env python


	from sys import argv

	from cogent.parse.fasta import MinimalFastaParser

	min_len = int(argv[2])

	for label,seq in MinimalFastaParser(open(argv[1], "U")):

walterst / remove_short_reads.py

Created June 10, 2016 17:02

Specify an input fasta file and minimum length, e.g.

	#!/usr/bin/env python


	from sys import argv

	from cogent.parse.fasta import MinimalFastaParser

	min_len = int(argv[2])

	for label,seq in MinimalFastaParser(open(argv[1], "U")):

walterst / strip_primers_forward_only.py

Last active May 14, 2016 18:19

USAGE: python strip_primers.py Mapping_file input_fasta output_fasta log_filename (modified to only search for forward primers, remove reads where primer isn't found).

	#!/usr/bin/env python

	# USAGE: python strip_primers.py Mapping_file input_fasta output_fasta log_filename

	from sys import argv
	from string import upper
	from re import compile

	from cogent.parse.fasta import MinimalFastaParser
	from skbio.sequence import DNA

Tony walterst