coleoguy’s gists

coleoguy / rosalind 2

Created August 24, 2013 19:52

	foo <- gsub("T", "U", as.character(read.table("rosalind_rna.txt")[[1]]))
	paste(foo)

coleoguy / rosalind 3

Created August 24, 2013 19:55

	foo <- unlist(strsplit(as.character(read.table("rosalind_revc.txt")[[1]]),""))
	DNA <- c("A", "C", "G", "T")
	rcDNA <- c("T", "G", "C", "A")
	foo2 <- vector(length = length(foo))
	for(i in 1:4){
	foo2[grep(DNA[i], unlist(foo))] <- rcDNA[i]
	}
	paste(foo2[length(foo2):1], collapse = "", sep = "")

coleoguy / gist:6335059

Created August 25, 2013 17:11

	data <- as.numeric(read.table("rosalind_iprb.txt"))
	k<-data[1]
	m<-data[2]
	n<-data[3]
	t<-sum(data)
	# Sire is AA Dam is anything
	t.prob <- k/(t)
	# Sire is Aa Dam is AA, Aa, aa
	t.prob <- t.prob + (((m/(t)) * (k/(t-1))) + #Aa x AA
	((m/(t)) * ((m-1)/(t-1)) * .75) + #Aa x Aa

coleoguy / gist:6335185

Created August 25, 2013 17:35

calculate hamming distance between to aligned sequences

	dna <- read.table("rosalind_hamm.txt")
	dna1 <- unlist(strsplit(as.character(dna[1,1]),""))
	dna2 <- unlist(strsplit(as.character(dna[2,1]),""))
	ham.dist <- 0
	for(i in 1:length(dna1)){
	if(dna1[i] != dna2[i]){
	ham.dist <- ham.dist +1
	}
	}
	cat(ham.dist)

coleoguy / mortalrabbits

Last active December 21, 2015 18:39

	lifespan <- 18
	months <- 85
	pop <- vector(mode="numeric", length=months) # this will hold the population each month as we build it up
	pop[1:2] <- 1 # rabbits take a month to mature so we go ahead and fill in month 1 and 2
	for(i in 3:months){ # this loop will just build our population month by month
	if(i-(lifespan+1) < 0){ # if we are early on none have died yet this if statement takes care of that
	pop[i] <- pop[i-1]+pop[i-2]-0
	}
	if(i-(lifespan+1) == 0){ # this one takes care of the first rabbit dying
	pop[i] <- pop[i-1]+pop[i-2]-1

coleoguy / dstat.R

Last active June 22, 2018 20:53

This file contains functions implementing ABBA BABA type tests for gene flow between populations. The three functions included are CalcD CalcPartD and CalcPopD. CalcD: This is the original version of the ABBA BABA test and calculates the D-statistic as found in Durand et al 2011 page 2240 CalcPopD: This version of the test is able to use many se…

	## ABBA BABA tests
	## Heath Blackmon
	## [email protected]
	## 28 July 2013
	## This file has algorithms found in:

	## Durand, Eric Y., et al. "Testing for ancient admixture between
	## closely related populations." Molecular biology and evolution
	## 28.8 (2011): 2239-2252.

coleoguy / retro-gff3parser.py

Last active January 3, 2016 20:29

Parse the castaneum gff file to extract all genes with multiple exons. I'd like to make this robust at some point to repeat this process with any genome gff3 file however in the interest of finishing this project and getting it written up this semester the current version is only tested on the castaneum genome (version 3) gff3 file.

	# the gff3 file that was used as input for this script was downloaded from:
	# ftp://ftp.ensemblgenomes.org/pub/metazoa/release-21/gff3/tribolium_castaneum
	# on January 10, 2014

	import re
	## This script is just to parse out the part of the gff file that we
	## are interested in, (i.e. lines describing exons or protein coding genes


	## Here we specify the name of the gff3

coleoguy / retro-EEfinder.py

Last active January 3, 2016 22:09

This python script takes a list of exons from multiple exon genes as well as fast files for each chromosome in a genome and it constructs a fasta file where each sequence is 60bp in length (last 30bp of one exon and the first 30bp of the next). These can then be used to search the genome for retroduplication events of genes. By limiting our selv…

	from Bio import SeqIO # to deal with the fast files

	# lets pull in our exon table
	datafile = open('exons.csv', 'r')
	data = []
	for row in datafile:
	data.append(row.strip().split(','))

	# now lets start the process of creating all of our exon-exon sequences
	# by first creating a list that has the name we want to assign and the LG and

coleoguy / retro-lgexonFinder.R

Last active January 4, 2016 09:28

creates a fasta file with 300bp chunks from the start of all exons larger than 300 bp

	setwd("~/Desktop/retrogenes/data/")
	gff <- read.csv("exons.csv", header = F, as.is=T) #This is a parsed GFF3 file that contains the lines for exons

	# lets get big ones >300 bp

	gff <- gff[gff[, 3] - gff[, 2] > 300, ]
	setwd("~/Desktop/retrogenes/data/chromosomes") #This is the directory that contains the chromosomes of the genome

	library(ape)
	seq <- name <- list()

coleoguy / retro-blast.commands

Created January 24, 2014 22:45

these are the command line arguments to create a local blast database from the new genome assembly (in our case T. confusum), and then blast the large exon file to create what we will call a look up table.

	# to construct database from an assembly in the file
	# conf.scaf.fa run this in the terminal
	# it will create a blast database call confscaf

	makeblastdb -in conf.scaf.fa -dbtype nucl -out confscaf

	# next we want to blast our large exons against this db
	blastn -query=/outputs/300+bpExons.fa -db=/blastDB/confscaf -outfmt='6 qseqid qstart sseqid sstart qlen length pident' -max_target_seqs=2 -out=info

	# this produces the lookup table that we can then use to analyze the degree of synteny conservation

Heath Blackmon coleoguy