Last active
December 20, 2015 18:59
-
-
Save slowkow/6180362 to your computer and use it in GitHub Desktop.
Genomic intervals: 0-based, 1-based, overlaps, and distance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
genomic_intervals.py | |
Kamil Slowikowski | |
February 27, 2014 | |
Genomic intervals: 0-based, 1-based, overlaps, and distance | |
=========================================================== | |
This document describes genomic intervals and includes source code for testing | |
overlap and calculating distance between intervals. | |
You will find files specifying genomic coordinates in two formats: | |
0-based : 0 1 2 3 4 (UCSC, BED, bedGraph, narrowPeak) | |
1-based : 1 2 3 4 (NCBI, Ensembl, GFF, GTF, VCF, SAM, BAM, wiggle) | |
sequence: A T G C | |
0-based starts with 0 and numbers the *spaces* in between nucleotides. | |
1-based starts with 1 and numbers the *nucleotides*. | |
The subsequence "TG" of the full string "ATGC" is: | |
0-based : [1, 3) | |
1-based : [2, 3] | |
The 0-based style does not include the last position: ")" | |
The 1-based style includes the last position: "]" | |
This results in different length calculations for subsequence "TG": | |
0-based : 3 - 1 = 2 | |
1-based : 3 - 2 + 1 = 2 | |
Reference | |
--------- | |
https://genome.ucsc.edu/FAQ/FAQformat.html | |
Example | |
------- | |
>>> a, b = (1, 3), (3, 7) | |
>>> print_intervals0(a, b) | |
01234567890 | |
== | |
==== | |
>>> print_intervals1(a, b) | |
1234567890 | |
=== | |
===== | |
>>> overlap0(a, b) | |
False | |
>>> overlap1(a, b) | |
True | |
>>> distance0(a, b) | |
0 | |
>>> distance1(a, b) | |
-1 | |
""" | |
# 0-based intervals | |
def overlap0(a, b): | |
"""Check if two 0-based intervals overlap.""" | |
# a.start < b.end and a.end > b.start | |
return a[0] < b[1] and a[1] > b[0] | |
def distance0(a, b): | |
"""Get the number of bases between two 1-based intervals, 0 if the | |
intervals are book-ended against each other, or, if negative, the number | |
of bases in the overlap. | |
""" | |
return max(a[0] - b[1], b[0] - a[1]) | |
def print_intervals0(*intervals): | |
start = min([i[0] for i in intervals]) | |
stop = max([i[1] for i in intervals]) | |
length = stop - start | |
print '0' + '1234567890' * ((length + 10) / 10) | |
for i in intervals: | |
spaces = ' ' * i[0] | |
marks = '=' * (i[1] - i[0]) | |
print spaces + marks | |
# 1-based intervals | |
def overlap1(a, b): | |
"""Check if two 1-based intervals overlap.""" | |
# a.start <= b.end and a.end >= b.start | |
return a[0] <= b[1] and a[1] >= b[0] | |
def distance1(a, b): | |
"""Get the number of bases between two 1-based intervals, 0 if the | |
intervals are book-ended against each other, or, if negative, the number | |
of bases in the overlap. | |
""" | |
return max(a[0] - b[1], b[0] - a[1]) - 1 | |
def print_intervals1(*intervals): | |
start = min([i[0] for i in intervals]) | |
stop = max([i[1] for i in intervals]) | |
length = stop - start + 1 | |
print '1234567890' * ((length + 10) / 10) | |
for i in intervals: | |
spaces = ' ' * (i[0] - 1) | |
marks = '=' * (i[1] - i[0] + 1) | |
print spaces + marks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment