Skip to content

Instantly share code, notes, and snippets.

@daler
Last active January 3, 2016 06:39
Show Gist options
  • Select an option

  • Save daler/8424229 to your computer and use it in GitHub Desktop.

Select an option

Save daler/8424229 to your computer and use it in GitHub Desktop.
"""
Illustrates how to use pybedtools to count motifs along sliding windows
of UTRs from a GFF file.
This uses example data shipped with pybedtools.
"""
import pybedtools
def renamer(f):
# BEDTools sees the featuretype of a GFF file as the "name" when using
# makewindows, so we need to set the featuretype to be the ID. Or whatever
# GFF attribute is of interest.
if f[2] == 'UTR':
f[2] = f['ID']
return f
# Prepare the UTRs file
utrs = pybedtools.example_bedtool('gdc.gff')\
.each(renamer)
# 20-bp windows, labeled by their name and an integer
windows = pybedtools.BedTool().window_maker(b=utrs, w=20, i='srcwinnum' )
motifs = pybedtools.example_bedtool('gdc.bed')
# Count up the number of features in each window.
# Last column of this file will have the number of motifs hitting that window
results = windows.intersect(motifs, c=True)
@daler

daler commented Jan 15, 2014

Copy link
Copy Markdown
Author

BEDTools makewindows doesn't keep strand information, so if you need strand then you need a different strategy:

def stranded_windows(f, n=5):
    name = f['ID']
    for i, start in enumerate(range(0, len(f) - n, n)):
        f.start = start
        f.stop = start + n
        f.attrs['ID'] = '%s_%s' % (name, i)
        yield f

windows = pybedtools.example_bedtool('gdc.gff')\
    .filter(lambda x: x[2] == 'UTR')\
    .split(stranded_windows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment