download files from s3 using 3hub
unzip respective files (started w/ Bin001.zip)
run process_reads.py (now part of https://github.com/faircloth-lab/illumiprocessor/):
python ~/git/brant/seqcap/Assembly/process_reads.py
run velvetoptimiser:
import numpy | |
s = '40 40 40 40 40' | |
sl = s.rstrip().split(' ') | |
si = [int(elem) for elem in sl] | |
sa = numpy.array(si) |
This is primarily directed towards preparing large amounts of UCE data for Genbank. However, parts of the following should work with most NGS data sets and other types of sequence data. Programs within phyluce are availble from:
https://github.com/faircloth-lab/phyluce
Sequin will trim vector contamination, but Sequin will also not handle huge files (nor do you want to have it try). So, the vector screening portions below attempt to be equivalent to this process.
These notes build from several excellent sources:
- http://sfg.stanford.edu/
- http://www.broadinstitute.org/gatk/guide/
- http://www.broadinstitute.org/gatk/guide/topic?name=best-practices
and assume you're working with GATK 2.2-16. These notes also assume
The following assumes you are converting BCL files containing PE100 reads with a 10 nt index read. You can allow Casava to demultiplex for you or do it on your own, later. You can adjust values below if you are doing something different (e.g. shorter reads, longer indexes) but be careful.
- You need a pretty beefy machine. Illumina recommends something with multiple cores and 48 GB RAM, running Centos 5. Centos 6 also works just fine. See their recommendations here:
# start the instance:
ec2-run-instances --key /path/to/my/ec2-keypair ami-74f0061d --instance-type=c1.xlarge --block-device-mapping '/dev/sda2=ephemeral0' --block-device-mapping '/dev/sda3=ephemeral1'
# mount the ephemeral storage:
sudo su mkdir /mnt/data mount /dev/sda2 /mnt/data
from Bio.Nexus import Nexus | |
aln = Nexus.Nexus() | |
aln.read('my-properly-formatted-nexus-file.nex') | |
# assuming your partitions are defined in a charset block like: | |
# | |
# begin sets; | |
# charset bag2 = 1-186; | |
# charset bag3 = 187-483; |
#!/usr/bin/env python | |
# encoding: utf-8 | |
""" | |
File: mpi_sate.py | |
Author: Brant Faircloth | |
Created by Brant Faircloth on 04 May 2012 15:05 PDT (-0700) | |
Copyright (c) 2012 Brant C. Faircloth. All rights reserved. | |
Description: |
import os | |
import tempfile | |
from mpi4py import MPI | |
comm = MPI.COMM_WORLD | |
size = comm.Get_size() | |
rank = comm.Get_rank() | |
mode = MPI.MODE_RDONLY |
start up ARDAgent (on remote machine via ssh):
sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart -activate \ -configure -users bcf -access -on -restart -agent -privs -all -allowAccessFor -specifiedUsers
start tunnel (from local to remote):
ssh -i keyfile -NfL 1202:127.0.0.1:5900 [email protected]
connect w/ (on local machine):