Skip to content

Instantly share code, notes, and snippets.

View crazyhottommy's full-sized avatar
🎯
Focusing

Ming Tang crazyhottommy

🎯
Focusing
View GitHub Profile
#2014 MSU NGS R basics tutorial
#http://angus.readthedocs.org/en/2014/R_Introductory_tutorial_2014.html
#https://github.com/jrherr/quick_basic_R_tutorial/blob/master/R_tutorial.md
#pick one language, and learn it well!
#pick up a dataset, play with it!
#object-oriented programming
#functional programming
#linux commands basics
#http://software-carpentry.org/v5/novice/shell/index.html
# practise, practise, practise, google, google, google and you will get it :)
pwd # print working directory
cd # change directory
sudo # super user privilege
chmod 775 # change the privileges http://en.wikipedia.org/wiki/Chmod
git clone # version control! get to know git and github! http://git-scm.com/
sudo bash # bad habit
with open ("C:/Users/Tang Ming/Desktop/anotation.txt", "r") as annotation:
anotation_dict = {}
for line in annotation:
line = line.split()
if line: #test whether it is an empty line
anotation_dict[line[0]]=line[1:]
else:
continue
# really should not parse the fasta file by myself. there are
library(Biobase)
library(GEOquery)
# load series and platform data from GEO
gset <- getGEO("GSE34412", GSEMatrix =TRUE)
gset<- gset[[1]]
# make proper column names to match toptable
# This code was modified from the tss plot code. It can plot any other ChIP-seq signal
# at other genomic positions in addtion to tss. In this case, it is the HRE. HIF1a ChIP-seq data
# is available, peaks were called by MACS, generated a bed file. the middle point
# of each peak is used as the center of the plot (you can also use summit of the peak from the exel file
# generated from MACS. HREs at promoters are not included
# 04/10/13
def TSS_Profile(ifile1,ifile2):
'''read in three files, ifile1 is the sortedbamfile prepared by samtool
ifile2 is the promoters (upstream 5kb of TSS) bed file with five columns: chr, start ,end, name and strand'''
library(gplots)
getwd()
setwd("/home/tommy/")
d<- read.table("co_up_or_down_uniq.txt", header=T)
# heatmap.2 works only with matrix, convert the dataframe to matrix
m<-as.matrix(d[,2:3])
rownames(m)<- d$genes # add the gene names as the row lable
png(filename = "co_regulated.png", width=400, height = 800) #save the heatmap to a png or a pdf by pdf(filename=...)
def TSS_Profile(ifile1,ifile2):
'''read in three files, ifile1 is the sortedbamfile prepared by samtool
ifile2 is the promoters (upstream 5kb of TSS) bed file with five columns: chr, start ,end, name and strand'''
import HTSeq
import numpy
import itertools
sortedbamfile=HTSeq.BAM_Reader(ifile1)
promoters = open(ifile2)
@crazyhottommy
crazyhottommy / shTet1.r
Last active December 30, 2015 16:19
GEOquery
# read GEO data sets from NCBI by GEOquery
setwd("/home/tommy/Tet1")# set the working directory
library(Biobase)
library(GEOquery)
# only set the GSEMatrix to FALSE can it be parsed for later use of function like
# Meta(gse)
gse<- getGEO('GSE26830', GSEMatrix=FALSE, destdir=".")
@crazyhottommy
crazyhottommy / DESeq.r
Created November 1, 2013 21:51
basic work flow of DESeq
setwd("/home/tommy/scripts")
library("DESeq")
countsTable<- read.delim("All_counts_nozero_1pseudocount_with_header.txt", header=TRUE)
rownames(countsTable)<- countsTable$Gene
countsTable<- countsTable[,-1]
head(countsTable)
conds<- factor(c("alpha","beta","alpha","beta","alpha","beta"))
cds<- newCountDataSet(countsTable, conds)
@crazyhottommy
crazyhottommy / alpha_beta_DEG.r
Last active December 27, 2015 04:59
Differential gene expression analysis for RNA-seq after HTSeq-count from the paper "Epigenomic plasticity enables human pancreatic α to β cell reprogramming"
library(limma)
library(edgeR)
x<-read.delim('counts.csv',skip=0, sep="\t", check.names=FALSE)
counts <- x[,c('a1','a2','a3','b1','b2','b3')]
keep <- apply(counts, 1, max) >= 0
x <- x[keep,]
counts <- counts[keep,]
design <- matrix(data=c(1,1,1,0,0,0,0,0,0,1,1,1), nrow=6, ncol=2, dimnames = list(c(), c('alpha','beta')))