Greg Caporaso gregcaporaso

Tiny test data sequence collection for use with QIIME.

This is being compiled to address #582.

Using this data

You can see the output of a few commands by downloading this data and running the cmds.sh shell script from inside the unzipped directory.

Desired properties of the test data

Script for filtering OTUs that show up in negative control samples. This is a first pass at testing a process that was developed for the Student Microbiome Project. The effect of this filtering has not be investigated in detail, so use at your own risk.

This script works as follows:

Filter input OTU table to contain only the control samples (as indicated by the -s parameter)
Compute the median or mean (specified with --abundance_f) abundance of each OTU in the control samples. Generate a list of OTUs where this value is >= the minimum abundance (specified with --min_abundance).
Filter the OTUs identified in Step 2 from the input OTU table.

USAGE: extract_fastq_barcodes_from_header.py input_reads.fastq barcode_reads.fastq

	#!/usr/bin/env python
	# File created on 02 Jan 2013
	from __future__ import division

	__author__ = "Greg Caporaso"
	__copyright__ = "Copyright 2011, The QIIME project"
	__credits__ = ["Greg Caporaso"]
	__license__ = "GPL"
	__version__ = "1.6.0"
	__maintainer__ = "Greg Caporaso"

	### Most Popular Index Sequences
	### Columns: Sequence ReverseComplement HitCount
	.CCA.TCG CGA.TGG. 4190556 TCCAGTCG 2 TGTATGCG TCCAGTCG TACTTCGG TTCCTGCT TGCGATCT TTGACTCT TGCATAGT
	.ACT.CGG CCG.AGT. 3867426 TACTTCGG 3
	.CCAGTCG CGACTGG. 2761048 TCCAGTCG 2
	.GTA.GCG CGC.TAC. 2595270 TGTATGCG 1
	.ACTTCGG CCGAAGT. 2415896 TACTTCGG 3
	.GTATGCG CGCATAC. 1570629 TGTATGCG 1
	.CCA.TC. .GA.TGG. 589625 TCCAGTCG 2
	TCCA.TCG CGA.TGGA 564313 TCCAGTCG 2

	pick_otus:enable_rev_strand_match True
	pick_otus:max_accepts 1
	pick_otus:max_rejects 8
	pick_otus:stepwords 8
	pick_otus:word_length 8

	from glob import glob

	filepaths = glob('*txt')
	for filepath in filepaths:
	f = open(filepath,'U')
	# tip: always open files for reading with mode 'U' rather
	# than mode 'r'
	## Do whatever with the open file
	f.close()

	<?xml version="1.0" encoding="UTF-8"?>
	<kml xmlns="http://www.opengis.net/kml/2.2">
	<Document>
	<Placemark>
	<name>HCanyon10R1</name>
	<description>HCanyon10R1</description>
	<Point>
	<coordinates>-110.40590,+37.33006</coordinates>
	</Point>
	</Placemark>