Skip to content

Instantly share code, notes, and snippets.

@fedarko
Created November 12, 2024 04:52
Show Gist options
  • Save fedarko/2ddbc533890185a166508d9a8242dc89 to your computer and use it in GitHub Desktop.
Save fedarko/2ddbc533890185a166508d9a8242dc89 to your computer and use it in GitHub Desktop.
Identify runs of consecutive integers in a list
def get_runs(e):
"""Identifies runs of consecutive ints in a list.
(The main reason I created this: identifying runs of 0-coverage positions in
the output of "samtools depth -a".)
Parameters
----------
e: list of int
Must not contain any duplicate elements.
Returns
-------
list of (int, int)
Each of these (int, int) tuples (a, b) will satisfy the following conditions:
1. Every integer i in the range a <= i <= b is present in the input list.
2. a - 1 is not present in the input list.
3. b + 1 is not present in the input list.
Note that one of these tuples could contain the same "start" and "end" value,
i.e. (a, a). This will happen if "a" is present in the input list, but "a - 1"
and "a + 1" are not.
"""
if len(e) == 0:
return []
elif len(e) == 1:
return [(e[0], e[0])]
if len(e) > len(set(e)):
raise ValueError("The input cannot contain duplicate elements.")
# Sort the list in ascending order
e = sorted(e)
prev = e[0]
start = e[0]
runs = []
for i in e[1:]:
# Using != instead of > here prevents floats from completely breaking this
# (not that you should be passing lists containing floats to this function but whatevs)
if i != prev + 1:
runs.append((start, prev))
start = i
prev = i
# Account for the very last position in the sorted list
runs.append((start, prev))
return runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment