Ming Tang crazyhottommy

🎯

Focusing

Director of Bioinformatics. Care about reproducible research and open science

3.3k followers · 44 following

AstraZeneca
Waltham, MA
http://divingintogeneticsandgenomics.com
@tangming2005
https://tommytang.bio.link
@chatomics

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

crazyhottommy / ngs2014_R_basics.r

Last active August 29, 2015 14:05

	#2014 MSU NGS R basics tutorial
	#http://angus.readthedocs.org/en/2014/R_Introductory_tutorial_2014.html
	#https://github.com/jrherr/quick_basic_R_tutorial/blob/master/R_tutorial.md

	#pick one language, and learn it well!

	#pick up a dataset, play with it!

	#object-oriented programming
	#functional programming

crazyhottommy / ngs2014_linux_basics.sh

Last active October 13, 2016 18:01

linux basics

	#linux commands basics
	#http://software-carpentry.org/v5/novice/shell/index.html
	# practise, practise, practise, google, google, google and you will get it :)

	pwd # print working directory
	cd # change directory
	sudo # super user privilege
	chmod 775 # change the privileges http://en.wikipedia.org/wiki/Chmod
	git clone # version control! get to know git and github! http://git-scm.com/
	sudo bash # bad habit

crazyhottommy / change_fasta_header.py

Last active August 29, 2015 14:04

	with open ("C:/Users/Tang Ming/Desktop/anotation.txt", "r") as annotation:
	anotation_dict = {}
	for line in annotation:
	line = line.split()
	if line: #test whether it is an empty line
	anotation_dict[line[0]]=line[1:]
	else:
	continue

	# really should not parse the fasta file by myself. there are

crazyhottommy / GEOquery_gene_expression.r

Created July 4, 2014 20:03

	library(Biobase)
	library(GEOquery)


	# load series and platform data from GEO

	gset <- getGEO("GSE34412", GSEMatrix =TRUE)
	gset<- gset[[1]]

	# make proper column names to match toptable

crazyhottommy / Tet1_CpG.py

Last active January 2, 2016 20:29

	# This code was modified from the tss plot code. It can plot any other ChIP-seq signal
	# at other genomic positions in addtion to tss. In this case, it is the HRE. HIF1a ChIP-seq data
	# is available, peaks were called by MACS, generated a bed file. the middle point
	# of each peak is used as the center of the plot (you can also use summit of the peak from the exel file
	# generated from MACS. HREs at promoters are not included
	# 04/10/13

	def TSS_Profile(ifile1,ifile2):
	'''read in three files, ifile1 is the sortedbamfile prepared by samtool
	ifile2 is the promoters (upstream 5kb of TSS) bed file with five columns: chr, start ,end, name and strand'''

crazyhottommy / co_up_down_genes_heatmap.r

Last active March 11, 2021 10:24

	library(gplots)
	getwd()
	setwd("/home/tommy/")
	d<- read.table("co_up_or_down_uniq.txt", header=T)

	# heatmap.2 works only with matrix, convert the dataframe to matrix
	m<-as.matrix(d[,2:3])
	rownames(m)<- d$genes # add the gene names as the row lable

	png(filename = "co_regulated.png", width=400, height = 800) #save the heatmap to a png or a pdf by pdf(filename=...)

crazyhottommy / Tet1_CpG_promoters.py

Last active December 31, 2015 02:29

	def TSS_Profile(ifile1,ifile2):
	'''read in three files, ifile1 is the sortedbamfile prepared by samtool
	ifile2 is the promoters (upstream 5kb of TSS) bed file with five columns: chr, start ,end, name and strand'''

	import HTSeq
	import numpy
	import itertools

	sortedbamfile=HTSeq.BAM_Reader(ifile1)
	promoters = open(ifile2)

crazyhottommy / shTet1.r

Last active December 30, 2015 16:19

GEOquery

	# read GEO data sets from NCBI by GEOquery

	setwd("/home/tommy/Tet1")# set the working directory
	library(Biobase)
	library(GEOquery)

	# only set the GSEMatrix to FALSE can it be parsed for later use of function like
	# Meta(gse)
	gse<- getGEO('GSE26830', GSEMatrix=FALSE, destdir=".")

crazyhottommy / DESeq.r

Created November 1, 2013 21:51

basic work flow of DESeq

	setwd("/home/tommy/scripts")
	library("DESeq")
	countsTable<- read.delim("All_counts_nozero_1pseudocount_with_header.txt", header=TRUE)
	rownames(countsTable)<- countsTable$Gene
	countsTable<- countsTable[,-1]

	head(countsTable)
	conds<- factor(c("alpha","beta","alpha","beta","alpha","beta"))

	cds<- newCountDataSet(countsTable, conds)

crazyhottommy / alpha_beta_DEG.r

Last active December 27, 2015 04:59

Differential gene expression analysis for RNA-seq after HTSeq-count from the paper "Epigenomic plasticity enables human pancreatic α to β cell reprogramming"

	library(limma)
	library(edgeR)

	x<-read.delim('counts.csv',skip=0, sep="\t", check.names=FALSE)
	counts <- x[,c('a1','a2','a3','b1','b2','b3')]
	keep <- apply(counts, 1, max) >= 0
	x <- x[keep,]
	counts <- counts[keep,]
	design <- matrix(data=c(1,1,1,0,0,0,0,0,0,1,1,1), nrow=6, ncol=2, dimnames = list(c(), c('alpha','beta')))

Newer Older