Skip to content

Instantly share code, notes, and snippets.

View gregcaporaso's full-sized avatar
🌱

Greg Caporaso gregcaporaso

🌱
View GitHub Profile
@gregcaporaso
gregcaporaso / Lecture10.ipynb
Last active December 11, 2015 20:39
IPython Notebooks to support Greg Caporaso's Spring 2013 Bioinformatics I class at Northern Arizona University: http://caporaso.us/teaching/courses/bio299_spring_2013/index.html This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gregcaporaso
gregcaporaso / README.md
Last active December 11, 2015 04:18
Very small sequence collection for use in QIIME tests (under development).

Tiny test data sequence collection for use with QIIME.

This is being compiled to address #582.

Using this data

You can see the output of a few commands by downloading this data and running the cmds.sh shell script from inside the unzipped directory.

Desired properties of the test data

@gregcaporaso
gregcaporaso / partition_sequences.py
Created January 2, 2013 15:33
Given an input sequence file, splits sequences randomly into n different files. This is useful for generating files that can be used to test computationally expensive analysis processes as analyses can be run iteratively on each input sequence set as the process can then be run iteratively, but also provide preliminary results based on random su…
#!/usr/bin/env python
# File created on 02 Jan 2013
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.6.0"
__maintainer__ = "Greg Caporaso"
@gregcaporaso
gregcaporaso / DemultiplexSummaryF1L1.txt
Created December 7, 2012 21:15
Very quick and dirty script to map some problematic barcodes from an EnGGen MiSeq run
### Most Popular Index Sequences
### Columns: Sequence ReverseComplement HitCount
.CCA.TCG CGA.TGG. 4190556 TCCAGTCG 2 TGTATGCG TCCAGTCG TACTTCGG TTCCTGCT TGCGATCT TTGACTCT TGCATAGT
.ACT.CGG CCG.AGT. 3867426 TACTTCGG 3
.CCAGTCG CGACTGG. 2761048 TCCAGTCG 2
.GTA.GCG CGC.TAC. 2595270 TGTATGCG 1
.ACTTCGG CCGAAGT. 2415896 TACTTCGG 3
.GTATGCG CGCATAC. 1570629 TGTATGCG 1
.CCA.TC. .GA.TGG. 589625 TCCAGTCG 2
TCCA.TCG CGA.TGGA 564313 TCCAGTCG 2
@gregcaporaso
gregcaporaso / README.md
Created December 3, 2012 20:35
first pass at code for filtering OTUs that show up in negative control samples

Script for filtering OTUs that show up in negative control samples. This is a first pass at testing a process that was developed for the Student Microbiome Project. The effect of this filtering has not be investigated in detail, so use at your own risk.

This script works as follows:

  1. Filter input OTU table to contain only the control samples (as indicated by the -s parameter)
  2. Compute the median or mean (specified with --abundance_f) abundance of each OTU in the control samples. Generate a list of OTUs where this value is >= the minimum abundance (specified with --min_abundance).
  3. Filter the OTUs identified in Step 2 from the input OTU table.
@gregcaporaso
gregcaporaso / ucrss_fast_params.txt
Created December 2, 2012 17:17
Parameters file for running subsamples OTU picking workflow in 'fast' mode
pick_otus:enable_rev_strand_match True
pick_otus:max_accepts 1
pick_otus:max_rejects 8
pick_otus:stepwords 8
pick_otus:word_length 8
@gregcaporaso
gregcaporaso / glob_example.py
Created November 24, 2012 17:00
Example of using glob to compile a list of filepaths
from glob import glob
filepaths = glob('*txt')
for filepath in filepaths:
f = open(filepath,'U')
# tip: always open files for reading with mode 'U' rather
# than mode 'r'
## Do whatever with the open file
f.close()
@gregcaporaso
gregcaporaso / jgc53_coordinates.kml
Created November 20, 2012 23:25
A small example of the output for Programming Assignment 3 (Greg Caporaso's Fall 2012 BIO 299 course)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark>
<name>HCanyon10R1</name>
<description>HCanyon10R1</description>
<Point>
<coordinates>-110.40590,+37.33006</coordinates>
</Point>
</Placemark>
@gregcaporaso
gregcaporaso / README.md
Created November 17, 2012 03:28
quick and dirty script to create a barcode read fastq file from a sequence read fastq file with barcodes in the headers

USAGE: extract_fastq_barcodes_from_header.py input_reads.fastq barcode_reads.fastq

@gregcaporaso
gregcaporaso / Lecture20.ipynb
Last active October 12, 2015 07:47
IPython Notebook files used in Greg Caporaso's Fall 2012 BIO599 Computational Biology course. See the included README.md file for more details and licensing information.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.