Skip to content

Instantly share code, notes, and snippets.

View crazyhottommy's full-sized avatar
🎯
Focusing

Ming Tang crazyhottommy

🎯
Focusing
View GitHub Profile
@crazyhottommy
crazyhottommy / reverse_complement.py
Last active February 15, 2016 10:37
reverse_complement
# get the reverse-complement DNA sequence
def ReverseComplement1(seq):
seq_dict = {'A':'T','T':'A','G':'C','C':'G'}
return "".join([seq_dict[base] for base in reversed(seq)])
# make it more robust, lower case DNA
@crazyhottommy
crazyhottommy / sam_to_bedgraph.py
Created October 31, 2013 18:46
sam_to_bedgraph_HTSeq
import HTSeq
alignment_file = HTSeq.SAM_Reader("SRR817000.sam")
# HTSeq also has a BAM_Reader function to handle the bam file
# initialize a Genomic Array (a class defined in the HTSeq package to deal with NGS data,
# it allows transparent access of the data through the GenomicInterval object)
# more reading http://www-huber.embl.de/users/anders/HTSeq/doc/genomic.html#genomic
coverage = HTSeq.GenomicArray("auto", stranded = True, typecode = 'i')
@crazyhottommy
crazyhottommy / TSS_profile.py
Created October 20, 2013 13:22
ChIP-seq TSS
#! /usr/bin/env python
# group the genes according to expression level
# analyze RNAseq data by counting tags for each gene using HTSeq.scripts.count or use bedtools muticov
# it genrates a file (K562_htseq_count.out.clean) with two columns, column 1 are gene names, column 2 are
#counts that mapped to all the exons of the same gene.
# compare the counts from different methods! and visualize them in IGV browser.
# top 30% midum 30% and low 30% gene names were obtained by linux command line
# sort -k2 -nrs K562_htseq_count.out.clean | wc -l
@crazyhottommy
crazyhottommy / groupby.py
Created October 19, 2013 15:49
crazyhottommy's Gist
# this script reformats the tab delimited file like:
#FBgn00001 GO:0016301 [Name:****(annotation)]
#FBgn00002 GO:0016301 [Name:****(annotation)]
#FBgn00003 GO:0016301 [Name:****(annotation)]
#FBgn00004 GO:0003700 [Name:****(annotation)]
#FBgn00004 GO:0009651 [Name:****(annotation)]
#FBgn00004 GO:0006355 [Name:****(annotation)]
#FBgn00005 GO:0009556 [Name:****(annotation)]
#FBgn00005 GO:0005515 [Name:****(annotation)]
#FBgn00005 GO:0080019 [Name:****(annotation)]