Skip to content

Instantly share code, notes, and snippets.

@sashaphanes
sashaphanes / t.test.p.values.R
Last active December 16, 2015 06:19
getting p-values from two dataframes
#read MAOA and MAOB human RNAseq data and use t.test by regions
setwd("/home/mydirectory/work.dir/allen.brain/human")
maoa = read.csv("combinedA.csv")
maob = read.csv("combinedB.csv")
test.result = mapply(t.test, maoa, maob)
#store p-values by brain region:
#p.values = mapply(function(x, y) t.test(x,y)$p.value, maoa, maob)
#store p-values in a column with regions in a separate column:
zcat human.Hela.ALDH.ethanol.RNAseq/features.stat/human.Hela.ALDH.ethanol.RNAseq.RPKM.limma.stat.txt.gz | head -n +4
ChrGene D.C1 D.C3 D.E1 D.E2 D.E3 S.C1 S.C3 S.E1 S.E2 S.E3 D.C1.qNorm D.C3.qNorm D.E1.qNorm D.E2.qNorm D.E3.qNorm S.C1.qNorm S.C3.qNorm S.E1.qNorm S.E2.qNorm S.E3.qNorm DEvsSE.logFC DEvsSE.AveExpr DEvsSE.t DEvsSE.P.Value DEvsSE.adj.P.Val DEvsSE.B DEvsSE.qvalue SEvsSC.logFCSEvsSC.AveExpr SEvsSC.t SEvsSC.P.Value SEvsSC.adj.P.Val SEvsSC.B SEvsSC.qvalue DEvsDC.logFC DEvsDC.AveExpr DEvsDC.t DEvsDC.P.Value DEvsDC.adj.P.Val DEvsDC.B DEvsDC.qvalue DCvsSC.logFC DCvsSC.AveExpr DCvsSC.t DCvsSC.P.Value DCvsSC.adj.P.Val DCvsSC.B DCvsSC.qvalue
chr12.ALDH2 406.3 358.9 455.3 443 481 9.1 3.3 4.3 2.8 4.8 8.5 8.448 8.862 8.74 8.862 3.041 2.892 2.691 2.218 2.41 6.38172000660375 5.66652337604155 35.9972945401694 2.12746190437488e-12 2.34297379528806e-08 16.3325155031259 1.49547550388356e-08 -0.527099724237586 5.66652337604155 -2.65931570380224 0.0229117862569256 0.999962251464947 -4.35286289202634 0.999962251464947
@sashaphanes
sashaphanes / RPKMalt.R
Created March 11, 2013 19:37
same as RPKM.R
>dataFile = "fileOfInterest.txt.gz"
>rawdata = read.delim(gzfile(dataFile), sep="\t", header=T)
>rownames(rawdata) = paste(rawdata[,"Chr"], rawdata[,"GeneSym"], sep = ".")
>data.RPKM = rawdata[, grep("RPKM", colnames(rawdata))]
>rowMax = do.call("pmax", c(data.RPKM[,1:ncol(data.RPKM)],na.rm=TRUE))
>data.RPKM.rowMax.gt1 = data.RPKM[rowMax >1,]
>data.RPKM.rowMax.gt1[is.na(data.RPKM.rowMax.gt1)] = 0
>data.RPKM.log2 = log2(data.RPKM.rowMax.gt1 + 1)
>RNAseq.RPKM.log.m = as.matrix(data.RPKM.log2)
>library(limma)
@sashaphanes
sashaphanes / rpkm.R
Created March 11, 2013 19:34
Exclude genes without a value over 2rpkm and select only columns marching string "rpkm"
source("http://bioconductor.org/biocLite.R")
biocLite("preprocessCore")
library(preprocessCore)
data <- read.delim(gzfile("fileOfInterest.txt.gz"), sep="\t")
data <- data[,grep("RPKM", colnames(data))]
filtered <- data[!apply(data, 1, function(x){ all(x <2) }),]
qn <- normalize.quantiles(as.matrix(filtered), copy = TRUE)]
qn <- as.data.frame(qn)
colnames(qn) <- colnames(filtered)
rownames(qn) <- rownames(filtered)
@sashaphanes
sashaphanes / gist:3939065
Last active October 11, 2015 23:48
PEWPS
#!/usr/bin/perl -sw
while (<>) {
foreach
$_ =~ s/,/\t/;
print "$_\n" if (eof);
}
user@localhost:~/work.dir/PerlScripts>cat ~/work.dir/Sasha/Expression\ Data/Homo_sapiens/aba_maoa/Columns.csv | perl replacer.pl
syntax error at replacer.pl line 5, near "$_ =~"
@sashaphanes
sashaphanes / addZeroes.pl
Created October 22, 2012 17:02
add zeroes to the front
#!/usr/bin/perl -s
#Q19. Write a simple perl statement to add leading 0s for the integers which is less than 10000.
#Assume the integers are in the range of 1 to 9999.
while (<>) {
$_ =~ s/([0-9]+)/sprintf('%05d',$1)/ge;
}
@sashaphanes
sashaphanes / color2basespace.pl
Created October 17, 2012 21:11
convert colorspace from SOLiD to basespace using STDIN
#!/usr/bin/perl -sw
while (<>) {
if(!(/^[ACGT0123NX\.]+$/)){
print;
next;
}
while(/[ACGT][0123]/){
s/A0/AA/;s/C0/CC/;s/G0/GG/;s/T0/TT/;
s/A1/AC/;s/C1/CA/;s/G1/GT/;s/T1/TG/;
@sashaphanes
sashaphanes / txtToFastq.pl
Created October 17, 2012 21:11
convert .txt files to .fastq format
#!/usr/bin/perl -sw
# sample inputs:
#ILLUMINA-A93428 66 1 1 7848 1267 CGATGT 1 ATGTTGGCTGGCGGTGAAATAAATCTCAAACGTACCTGTTACAAAATTGTGTTAATCCTTTCAGATTCGCAG ggggggggggggggcfdffdgggfefffffcfcffcccacBeddddded_cffefgggggBBBBBBBBBBBB chrX.fa 122287709 R 40C21GAC7 269 Y
# output for sample input:
#@ILLUMINA-A93428:66:1:1:7848:1267:#CGATGT/1
#ATGTTGGCTGGCGGTGAAATAAATCTCAAACGTACCTGTTACAAAATTGTGTTAATCCTTTCAGATTCGCAG
#+ILLUMINA-A93428:66:1:1:7848:1267:#CGATGT/1
#ggggggggggggggcfdffdgggfefffffcfcffcccacBeddddded_cffefgggggBBBBBBBBBBBB
@sashaphanes
sashaphanes / printUniqueStrings.pl
Created October 15, 2012 20:09
print all different strings found in a list once each, remove "
#!/usr/bin/perl -s
require "getopts.pl";
use vars qw ($opt_f $opt_d $opt_h $opt_H);
&Getopts('f:d:hH:');
my $executable = "printUniqueStrings.pl";
$| = 1;
my $usage = qq{
@sashaphanes
sashaphanes / gen.CSV.perTissue.sh
Created October 15, 2012 19:04
Streamline getting expression data for a particular array of genes in a particular brain tissues
#This is a shellscript intended to streamline the process of checking an initial array of genes against
#1) a file storing relevant column numbers for a particular neural tissue,
#2) a file storing expression data regarding that tissue and those genes
#!/usr/bin/perl
use strict;
#use warnings;
require "getopts.pl";