Skip to content

Instantly share code, notes, and snippets.

View gregcaporaso's full-sized avatar
🌱

Greg Caporaso gregcaporaso

🌱
View GitHub Profile
@gregcaporaso
gregcaporaso / README.md
Last active August 29, 2015 13:58
exploring using pandas for QIIME mapping files

Some exploration into how the pandas DataFrame could be a useful underlying representation for QIIME's metadata mapping files.

@gregcaporaso
gregcaporaso / variability_v_diversity.py
Last active August 29, 2015 13:56
first pass script for comparing median diversity versus median variability for a given category
#!/usr/bin/env python
# File created on 26 Feb 2014
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2014, The QIIME Project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.8.0-dev"
__maintainer__ = "Greg Caporaso"
@gregcaporaso
gregcaporaso / check_illumina_barcodes.py
Created December 3, 2013 16:32
Script for performing some basic testing on Illumina amplicon sequencing primers as described in: http://www.nature.com/ismej/journal/v6/n8/full/ismej20128a.html
#!/usr/bin/env python
# File created on 01 Dec 2011
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.3.0-dev"
__maintainer__ = "Greg Caporaso"
#!/usr/bin/env python
from sys import argv
from random import random
from cogent.parse.fastq import MinimalFastqParser
from cogent.draw.distribution_plots import generate_box_plots
from qiime.quality import ascii_to_phred33
from qiime.util import qiime_open
def fastq_quality_plots(fastq_records,
@gregcaporaso
gregcaporaso / compare_pre_post_distances.py
Last active December 27, 2015 01:09
Code for comparing groups (e.g., treatment and control) of pre/post UniFrac distances to determine if one group's microbiomes are more stable than the other.
#!/usr/bin/env python
# Authors: Greg Caporaso, John Chase
# Questions: Contact [email protected]
# Step 1: Generate lists of pre/post sample ids on a per-individual basis
# qiime.group.extract_per_individual_states_from_sample_metadata
# will let you generate a dict of individual id to (pre sample-id, post sample-id)
# Step 2: Extract distances for pre/post sample ids
# qiime.parse.parse_distmat_to_dict
@gregcaporaso
gregcaporaso / README.md
Created August 23, 2013 14:54
Example files used while developing pyqi's Getting Started tutorials.

These files were used while developing pyqi's Getting Started tutorials. See those documents for usage examples.

@gregcaporaso
gregcaporaso / README.md
Last active December 21, 2015 02:39
Code and analysis notes for determine the taxonomic-specificity of a set of sequences with associated taxonomy strings. This has been tested with the Greengenes 13_5 database. See README.md for usage instructions and some analysis notes.

Taxonomic specificity of sequences in Greengenes

Here I'm creating a hash of expected 515F/806R amplicons from the Greengenes OTUs (for a couple of different sizes of OTUs), and comparing the uniqueness of sequences with the number of different taxonomic identities at each level.

There are basically three categories of sequences:

  1. those that are unique, and therefore can only map to a single taxa
  2. those that are not unique, but still only map to a single taxa
  3. those that are not unique, and map to multiple taxa.
@gregcaporaso
gregcaporaso / reorg-dir-structure.py
Created August 9, 2013 20:40
Quick-and-dirty code to generate a shell script to re-organize the directory structure in the short-read-tax-assignment repo.
41,9 All
#!/usr/bin/env python
# Author: Greg Caporaso
from os.path import join, isdir
from glob import glob
base_in_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/multiple_assign_taxonomy_output/"
base_out_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/eval-pre-computed/"
@gregcaporaso
gregcaporaso / generate_usearch_cmds.py
Last active December 20, 2015 06:09
code for converting blast "bl6" file to assignments (e.g., functional, taxonomic, etc).
#!/usr/bin/env python
from os.path import join
query_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/rep_set.fna"
reference_seqs_fp = "/data/gg_13_5_otus/rep_set/97_otus.fasta"
reference_tax_fp = "/data/gg_13_5_otus/taxonomy/97_otu_taxonomy.txt"
input_biom_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/otu_table_mc2_no_pynast_failures.biom"
output_biom_fn = "otu_table_mc2_no_pynast_failures_w_taxa.biom"
output_dir = "/home/caporaso/analysis/short-read-tax-assignment/demo/eval-demo/usearch_v_97/"
@gregcaporaso
gregcaporaso / uc_fast_params.txt
Created July 8, 2013 21:49
uclust-fast parameter settings (this is a valid QIIME parameters file, and is used in the [Illumina overview tutorial](http://qiime.org/tutorials/illumina_overview_tutorial.html)).
pick_otus:enable_rev_strand_match True
pick_otus:max_accepts 1
pick_otus:max_rejects 8
pick_otus:stepwords 8
pick_otus:word_length 8