Skip to content

Instantly share code, notes, and snippets.

View stephenturner's full-sized avatar

Stephen Turner stephenturner

View GitHub Profile
@stephenturner
stephenturner / mm9symbols.txt
Last active December 17, 2015 01:09
Official gene symbols obtained from the mm9 UCSC GTF using the command: cat genes.gtf | awk '{print $10}' | sed 's/[";]//g' | uniq | sort | uniq > mm9symbols.txt
0610005C13Rik
0610007C21Rik
0610007L01Rik
0610007N19Rik
0610007P08Rik
0610007P14Rik
0610007P22Rik
0610008F07Rik
0610009B14Rik
0610009B22Rik
@stephenturner
stephenturner / hg19symbols.txt
Created May 6, 2013 16:28
Official gene symbols obtained from the hg19 UCSC GTF using the command: cat genes.gtf | awk '{print $10}' | sed 's/[";]//g' | uniq | sort | uniq > hg19symbols.txt
1/2-SBSRNA4
A1BG
A1BG-AS1
A1CF
A2LD1
A2M
A2ML1
A2MP1
A4GALT
A4GNT
@stephenturner
stephenturner / get-uscode-wc.sh
Last active June 12, 2021 21:43
Get a rough word count of the entire US Code of Law
# Replace the parallel stuff with xargs if you don't have GNU parallel, or just pipe to 'sh'.
# Use 'wget' on linux, 'curl -O' on mac.
for i in $(seq -f "%02.f" 1 51); do echo "curl -O http://uscode.house.gov/download/pls/Title_$i.ZIP"; done | parallel
find *ZIP | parallel --dry-run unzip {}
cat *txt | wc
## Results:
# cat *txt | wc
# 6546729 45920853 338309328
chr start stop ID
chr1 4773206 4785739 Mrpl15
chr1 8361475 9299878 Sntg1
chr1 10993465 11303682 Prex2
chr1 11414105 11975901 A830018L16Rik
chr1 12692430 12860371 Sulf1
chr1 13113457 13127163 Prdm14
chr1 16228674 16520112 Stau2
chr1 18115191 18145902 Crisp4
chr1 19208914 19238734 Tfap2b
@stephenturner
stephenturner / qq-with-ymax.r
Created April 5, 2013 19:22
create qq plot where you can specify the upper limit on the y-axis.
qq = function(pvector, ymax=NA, ...) {
if (!is.numeric(pvector)) stop("D'oh! P value vector is not numeric.")
pvector <- pvector[!is.na(pvector) & pvector<1 & pvector>0]
o = -log10(sort(pvector,decreasing=F))
e = -log10( ppoints(length(pvector) ))
if (!is.numeric(ymax) | ymax<max(o)) ymax <- max(o)
plot(e,o,pch=19,cex=1, xlab=expression(Expected~~-log[10](italic(p))), ylab=expression(Observed~~-log[10](italic(p))), xlim=c(0,max(e)), ylim=c(0,ymax), ...)
abline(0,1,col="red")
}
@stephenturner
stephenturner / intro-bioinfo-microarray.r
Created April 4, 2013 14:49
Code from UVA cell bio 8401 intro bioinformatics lab. Demonstrates some basic features of R, then dives into a microarray analysis of Affy genechips using limma.
## Introduction to R and Bioconductor for Cell Biology 8401
# Introduction to R -------------------------------------------------------
# Introduce R, Rstudio, layout, workspace, etc.
# R is a glorified calculator
# Start this in console, move to editor, use run button
2+2
5*4
2^3
@stephenturner
stephenturner / progress-bar-r.r
Created February 20, 2013 20:54
Creates a text progress bar in R
niter <- 100
pb <- txtProgressBar(min=0, max=niter, style=3)
for (i in 1:niter) {
Sys.sleep(0.025) # Do something here besides sleep!
setTxtProgressBar(pb, i)
}
library(genefilter)
library(mouse4302.db)
# do stuff here to load your affybatch, run rma.
# affybatch <- ReadAffy(filenames)
# eset_orig <- rma(affybatch)
## Condinue using the genefilter package to:
## 1. Remove probes that aren't annotated with an entrez ID,
## 2. If multiple probes map to the same gene, keep the one with the largest IQR (most reliably detected)
chr7 109521280 109521409 SNORA3_1
chr11 6619808 6619940 SNORA5_2
chr19 8725594 8725660 SNORD31_3
chr19 8725341 8725409 SNORD30_4
chr4 3835079 3835146 snoU54_5
chr7 111076060 111076227 snoU97_6
chr15 32241853 32241923 SNORD123_7
chr19 8725866 8725991 Snord22_8
chr11 6620319 6620454 Snora5c_9
chr19 8725092 8725156 SNORD29_10
@stephenturner
stephenturner / problem_wrapped.fa
Created December 10, 2012 17:58
problem_wrapped.fa
>scaffold46 3.1
TTCAGTGACATCACCCTCTAAAGAATTCTCTCCCACCGATAAATTCTCCAACTTTGATAGGAGTAGGATGCTCTCGGGTA
TCTTGCCAGTAATTTGGTTATAAGATATGTTCAATATCCTTAATGTGTGTCTGGTGCACCATNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN