Ming Tang crazyhottommy

Pandoc is a very useful tool to convert common formats.

First install pandoc on mac by:

brew install pandoc

pandoc requires pdflatex to convert to pdfs.

install mactex:
download it and just double click it should install it.

	human	mouse
	A1BG	A1bg
	A1CF	A1cf
	A2LD1	A2ld1
	A2M	A2m
	A4GALT	A4galt
	A4GNT	A4gnt
	AAAS	Aaas
	AACS	Aacs
	AADAC	Aadac

cat ref.gene.txt
chr1 1736 4272 DDX11L1 +
chr1 4224 19233 WASH7P -
chr1 4224 7502 LOC100288778 -
chr1 7231 7299 MIR6859-1 -
chr1 7231 7299 MIR6859-2 -
chr1 7231 7299 MIR6859-3 -
chr1 7231 7299 MIR6859-4 -

get rid of the digits (gene version) in the end for the gene names (gencode v19)

cat STAR_WT-30393468_htseq.cnt| sed -E 's/\.[0-9]+//' > WT_htseq.cnt

transcript to gene mapping file:

library(EnsDb.Hsapiens.v75)

	#!/bin/bash
	# function Extract for common file formats

	function extract {
	if [ -z "$1" ]; then
	# display usage if no parameters given
	echo "Usage: extract <path/file_name>.<zip\|rar\|bz2\|gz\|tar\|tbz2\|tgz\|Z\|7z\|xz\|ex\|tar.bz2\|tar.gz\|tar.xz>"
	else
	if [ -f "$1" ] ; then
	NAME=${1%.*}

	## http://stackoverflow.com/questions/19876505/boxplot-show-the-value-of-mean
	## plot adding mean value
	ggplot(NLR.tidy, aes(x=NLR, y=ratio_value, color= NLR,fill= NLR)) +
	geom_point(position=position_jitterdodge(dodge.width=0.9)) +
	geom_boxplot(fill="white", alpha=0.1, outlier.colour = NA,
	position = position_dodge(width=0.9)) +
	coord_cartesian(ylim = c(-0.5, 15)) +
	stat_summary(fun.y = mean, geom="point",colour="black", size=3, show.legend = FALSE) +
	stat_summary(fun.y=mean, colour="red", geom="text", show.legend =FALSE,
	vjust=-0.7, aes( label=round(..y.., digits=1)))

	aDict = {"B":"inputG1", "A":"inputG1", "C":"inputG2"}

	rule all:
	input: ["C.bed", "A.bed", "B.bed"]

	def get_files(wildcards):
	case = wildcards.case
	control = aDict[case]
	return [case + ".sorted.bam", control + ".sorted.bam"]

	---
	title: "lncRNA_heatmap"
	author: "Ming Tang"
	date: "July 28, 2016"
	output: html_document
	---

	Read in the bigwig files for each mark. bigwig files were generated by Deeptools from bam files.
	```{r}
	library(EnrichedHeatmap)

	## devtools::install_github("stephenturner/msigdf")
	library(msigdf)
	library(dplyr)
	library(clusterProfiler)

	c2 <- msigdf.human %>%
	filter(collection == "c2") %>% select(geneset, entrez) %>% as.data.frame

	data(geneList)
	de <- names(geneList)[1:100]


	Make a heatmap with colored dendrogram by `complexHeatmap` and `Dendsort`.
	See help [here](https://bioconductor.org/packages/release/bioc/vignettes/ComplexHeatmap/inst/doc/s2.single_heatmap.html)
	```r
	##### a make_hc function to receive different distance_measure and linkage_method
	make_hc<- function(x, distance_measure, linkage_method){
	if (distance_measure == "pearson"){
	## cor calculate for columns, needs to transpose x first
	distance <- as.dist(1-cor(t(x), method = "pearson"))
	hc<- hclust(distance, method = linkage_method)