Skip to content

Instantly share code, notes, and snippets.

@arq5x
Created December 4, 2013 15:52
Show Gist options
  • Save arq5x/7789928 to your computer and use it in GitHub Desktop.
Save arq5x/7789928 to your computer and use it in GitHub Desktop.
Flattened CCDS
import pybedtools as pbt
import sys
def merge_gene(lines):
tmp = pbt.BedTool(lines, from_string=True).merge(nms=True)
print tmp
gene_lines = ''
curr_gene = None
prev_gene = None
for line in sys.stdin:
fields = line.strip().split()
curr_gene = fields[3]
if curr_gene != prev_gene and prev_gene is not None:
merge_gene(gene_lines)
gene_lines = line
else:
gene_lines += line
prev_gene = curr_gene
merge_gene(gene_lines)
awk '{OFS="\t"; print $1,$2,$3,$5"_"$6,$7,$4}' hgncgenes.exons.bed | awk 'NR>1' > temp
cat temp | python flatten_transcripts.py | awk 'length($0) > 0' | tr ";" "\t" | cut -f 1-4 > flattened_genes.bed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment