Ming Tang crazyhottommy

🎯

Focusing

Director of Bioinformatics. Care about reproducible research and open science

crazyhottommy / gene_sets_hypergeometric_test.py

Last active March 8, 2019 07:12

	#! /usr/bin/env python

	import sys
	import scipy.stats as stats

	#The result will be
	# a p-value where by random chance number of genes with both condition A and B will be <= to your number with condition A and B
	# a p-value where by random chance number of genes with both condition A and B will be >= to your number with condition A and B
	# The second p-value is probably what you want.

crazyhottommy / Entrez_Direct.sh

Last active August 29, 2015 14:14

	# search pubmed contains "glioblastoma enhancer"
	$esearch -db pubmed -query "glioblastoma enhancer"
	<ENTREZ_DIRECT>
	<Db>pubmed</Db>
	<WebEnv>NCID_1_539964707_130.14.18.34_9001_1422280320_2091337226_0MetA0_S_MegaStore_F_1</WebEnv>
	<QueryKey>1</QueryKey>
	<Count>97</Count>
	<Step>1</Step>
	</ENTREZ_DIRECT>

crazyhottommy / batch_convert_faidx.sh

Last active August 29, 2015 14:16

	#! /usr/bin
	# put the coordinates in a bed file

	infile=$1
	while read chr start end
	do
	samtools faidx ref.fasta $chr:$start-$end >> test.fa
	done <$infile

crazyhottommy / Pvalue_FDR_multiple_test.r

Last active August 29, 2015 14:17

	### This part is from the Edx online Harvard course
	## HarvardX: PH525.3x Advanced Statistics for the Life Sciences, week1

	library(devtools)
	install_github("genomicsclass/GSE5859Subset")

	library(GSE5859Subset)
	data(GSE5859Subset)
	dim(geneExpression)

crazyhottommy / benchmark_shuf.sh

Last active August 29, 2015 14:17

	# creat a test file
	$time seq 1 10000000 > ten_million.txt
	seq 1 10000000 > ten_million.txt 3.51s user 0.13s system 99% cpu 3.663 total

	# it is a "big" file with size of 109M
	$ls -lh ten_million.txt
	-rw-r--r-- 1 Tammy staff 109M Mar 22 20:49 ten_million.txt

	$man gshuf
	# randomly select 1000 lines from it

crazyhottommy / simulation.r

Last active August 29, 2015 14:19

	## Overview
	# central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.

	# I am going to draw 40 numbers from exponential distribution for 1000 times, and calcuate the mean
	# of each draw (we will have 1000 means), and through this simulation to test if the
	# distribution of the means will be normal or not.

	## start simulation
	# number of simulation, sample size and lambda
	nosim<- 1000

crazyhottommy / breakdancer_filter.r

Created April 24, 2015 22:28

	options(stringsAsFactors=F)
	library(gdata)
	library(parallel)
	files = list.files(path='ctx/',pattern='*.bd$')
	meta = read.csv("WGS.coverage.csv")

	mclapply (files, function(f) {
	dat = read.delim(sprintf('ctx/%s', f),comment.char='#',header=F,as.is=T)[,-(12:14)]
	message(sprintf("File: %s, Dim: (%s)", f, paste(dim(dat), collapse=",")))

crazyhottommy / get_promoter_seq.r

Last active September 16, 2023 10:42

	## get all the promoter sequences for human hg19 genome
	## Author: Ming Tang (Tommy)
	## Date: 04/30/2015

	## load the libraries
	library(GenomicRanges)
	library(BSgenome.Hsapiens.UCSC.hg19)
	BSgenome.Hsapiens.UCSC.hg19
	# or
	Hsapiens

crazyhottommy / IBash_Notebook_reproducible_research.json

Last active August 29, 2015 14:20

	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### I am going to demonstrate how to use ipython notebook bash_kernal to do reproducible research.\n",
	"I can do command line in the notebook and take notes along the way.\n",
	"Let's go to the directory first."
	]

crazyhottommy / heatmap_ChIP-seq.r

Created May 5, 2015 14:19

	# This R script is to generate the TF or histone modification heatmap
	# at certain genomic features (TSS, enhancers) from the ChIP-seq data
	# the input matrix is got from Homer software. alternative to R, use cluster3 to cluster, and visualize by # java Treeviewer
	# generate the matrix by Homer: annotatePeaks.pl peak_file.txt hg19 -size 6000 -hist 10 -ghist -d TF1/ # > outputfile_matrix.txt
	# see several posts for heatmap:
	# http://davetang.org/muse/2010/12/06/making-a-heatmap-with-r/
	# http://www.r-bloggers.com/r-using-rcolorbrewer-to-colour-your-figures-in-r/
	# 08/20/13 by Tommy Tang

	# it is such a simple script but took me several days to get it work...I mean the desired