Last active
January 3, 2016 06:39
-
-
Save daler/8424229 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Illustrates how to use pybedtools to count motifs along sliding windows | |
of UTRs from a GFF file. | |
This uses example data shipped with pybedtools. | |
""" | |
import pybedtools | |
def renamer(f): | |
# BEDTools sees the featuretype of a GFF file as the "name" when using | |
# makewindows, so we need to set the featuretype to be the ID. Or whatever | |
# GFF attribute is of interest. | |
if f[2] == 'UTR': | |
f[2] = f['ID'] | |
return f | |
# Prepare the UTRs file | |
utrs = pybedtools.example_bedtool('gdc.gff')\ | |
.each(renamer) | |
# 20-bp windows, labeled by their name and an integer | |
windows = pybedtools.BedTool().window_maker(b=utrs, w=20, i='srcwinnum' ) | |
motifs = pybedtools.example_bedtool('gdc.bed') | |
# Count up the number of features in each window. | |
# Last column of this file will have the number of motifs hitting that window | |
results = windows.intersect(motifs, c=True) |
My pybedtools is:
pybedtools-0.6.2-py2.7-linux-x86_64.egg
(upgrading to 0.6.4 and BEDTools 2.18 solved this)
BEDTools makewindows
doesn't keep strand information, so if you need strand then you need a different strategy:
def stranded_windows(f, n=5):
name = f['ID']
for i, start in enumerate(range(0, len(f) - n, n)):
f.start = start
f.stop = start + n
f.attrs['ID'] = '%s_%s' % (name, i)
yield f
windows = pybedtools.example_bedtool('gdc.gff')\
.filter(lambda x: x[2] == 'UTR')\
.split(stranded_windows)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If I try this I get: