Created
November 12, 2024 04:52
-
-
Save fedarko/2ddbc533890185a166508d9a8242dc89 to your computer and use it in GitHub Desktop.
Identify runs of consecutive integers in a list
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_runs(e): | |
"""Identifies runs of consecutive ints in a list. | |
(The main reason I created this: identifying runs of 0-coverage positions in | |
the output of "samtools depth -a".) | |
Parameters | |
---------- | |
e: list of int | |
Must not contain any duplicate elements. | |
Returns | |
------- | |
list of (int, int) | |
Each of these (int, int) tuples (a, b) will satisfy the following conditions: | |
1. Every integer i in the range a <= i <= b is present in the input list. | |
2. a - 1 is not present in the input list. | |
3. b + 1 is not present in the input list. | |
Note that one of these tuples could contain the same "start" and "end" value, | |
i.e. (a, a). This will happen if "a" is present in the input list, but "a - 1" | |
and "a + 1" are not. | |
""" | |
if len(e) == 0: | |
return [] | |
elif len(e) == 1: | |
return [(e[0], e[0])] | |
if len(e) > len(set(e)): | |
raise ValueError("The input cannot contain duplicate elements.") | |
# Sort the list in ascending order | |
e = sorted(e) | |
prev = e[0] | |
start = e[0] | |
runs = [] | |
for i in e[1:]: | |
# Using != instead of > here prevents floats from completely breaking this | |
# (not that you should be passing lists containing floats to this function but whatevs) | |
if i != prev + 1: | |
runs.append((start, prev)) | |
start = i | |
prev = i | |
# Account for the very last position in the sorted list | |
runs.append((start, prev)) | |
return runs |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment