Skip to content

Instantly share code, notes, and snippets.

View coleoguy's full-sized avatar

Heath Blackmon coleoguy

View GitHub Profile
library(shiny)
# I like this color pallete much more than the base options
library(viridis)
shinyServer(function(input, output) {
# this is the actual brownian motion simulation
x <- reactive({
# insure reproducability
set.seed <- input$seed.val
replicate(input$reps, cumsum(rnorm(input$gens)))
})
## plotting simulated data
require("vcd")
colnames(simresult) <- c("Root", "XY", "XO", "Xyp")
simresult <- as.data.frame(simresult)
colors <- c("black","red","green", "orange") #This sets up a vector of colors
ternaryplot( #This is the actual plotting of our data
simresult[,2:4], #Here I provide the file and columns to plot
pch = 20, #This is choosing the shape of data points (simple circle here)
cex = .5, #This is the size of the data points
col = colors[as.numeric(simresult$Root)], #Telling it to color the points
@coleoguy
coleoguy / gist:1011aea531c2707f1da6
Created September 5, 2015 21:35
non-phylogenetic approach to simulation
## non-phylogenetic approach to simulating data
simresult <- matrix(1,200,4)
set.seed(4)
simresult[,2] <- sample(1:498, size=200)
for(i in 1:200){
simresult[i,3] <- sample(1:(499 - simresult[i,2]), size=1)
simresult[i,4] <- sample(1:(500 - (simresult[i,2] + simresult[i,3])), size=1)
simresult[i,1] <- sample(2:4, size=1, prob = c(simresult[i,2],
simresult[i,3],
@coleoguy
coleoguy / gist:818679ace1eebb95ab21
Created September 5, 2015 21:34
simulate with phylogenetic model
## phylogenetic approach to simulating data
library(geiger)
tree <- sim.bdtree(b=1, d=.4, stop=c("taxa", "time"), n=200, seed=0, extinct=T)
tree <- drop.extinct(tree)
q <- list(rbind(c(-.06, .05, .01), c(.09, -.1, .01), c(.01, .01, -.02)))
simresult <- matrix(1,200,4)
for(i in 1:200){
sim.root <- sample(1:3, prob=c(.55,.35,.1), size=1)
phylosim <- sim.char(phy=tree, model="discrete", par=q,
root=sim.root)
@coleoguy
coleoguy / wordTOcodons
Created April 10, 2014 18:06
sentence to codons
## simple function to take a line of text and return a version of
## this text that has been encoded into DNA. This function makes
## use of a couple of codons that are normally stop codons but in
## certain organisms have been coopted to code for unusual amino
## acids. This is how we are able to expand the alphabet to include
## O and U
wordTOcodons <- function(x){
table <- matrix(c("START", 'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I',
@coleoguy
coleoguy / gist:10136883
Last active September 9, 2015 20:37
just a simple sliding window function
slideFunct <- function(data, window, step){
total <- length(data)
spots <- seq(from = 1, to = (total - window + 1), by = step)
result <- vector(length = length(spots))
for(i in 1:length(spots)){
result[i] <- mean(data[spots[i]:(spots[i] + window - 1)])
}
return(result)
}
data <- c(runif(100000, min=0, max=.1),runif(100000, min=.05, max=.1),runif(10000, min=.05, max=1), runif(100000, min=0, max=.2))
@coleoguy
coleoguy / retro-blast.commands
Created January 24, 2014 22:45
these are the command line arguments to create a local blast database from the new genome assembly (in our case T. confusum), and then blast the large exon file to create what we will call a look up table.
# to construct database from an assembly in the file
# conf.scaf.fa run this in the terminal
# it will create a blast database call confscaf
makeblastdb -in conf.scaf.fa -dbtype nucl -out confscaf
# next we want to blast our large exons against this db
blastn -query=/outputs/300+bpExons.fa -db=/blastDB/confscaf -outfmt='6 qseqid qstart sseqid sstart qlen length pident' -max_target_seqs=2 -out=info
# this produces the lookup table that we can then use to analyze the degree of synteny conservation
@coleoguy
coleoguy / retro-lgexonFinder.R
Last active January 4, 2016 09:28
creates a fasta file with 300bp chunks from the start of all exons larger than 300 bp
setwd("~/Desktop/retrogenes/data/")
gff <- read.csv("exons.csv", header = F, as.is=T) #This is a parsed GFF3 file that contains the lines for exons
# lets get big ones >300 bp
gff <- gff[gff[, 3] - gff[, 2] > 300, ]
setwd("~/Desktop/retrogenes/data/chromosomes") #This is the directory that contains the chromosomes of the genome
library(ape)
seq <- name <- list()
@coleoguy
coleoguy / retro-EEfinder.py
Last active January 3, 2016 22:09
This python script takes a list of exons from multiple exon genes as well as fast files for each chromosome in a genome and it constructs a fasta file where each sequence is 60bp in length (last 30bp of one exon and the first 30bp of the next). These can then be used to search the genome for retroduplication events of genes. By limiting our selv…
from Bio import SeqIO # to deal with the fast files
# lets pull in our exon table
datafile = open('exons.csv', 'r')
data = []
for row in datafile:
data.append(row.strip().split(','))
# now lets start the process of creating all of our exon-exon sequences
# by first creating a list that has the name we want to assign and the LG and