Skip to content

Instantly share code, notes, and snippets.

@arq5x
Created June 18, 2013 13:20
Show Gist options
  • Save arq5x/5805291 to your computer and use it in GitHub Desktop.
Save arq5x/5805291 to your computer and use it in GitHub Desktop.
"Strategy" for generating shuffled BED files while preventing overlapping records in the shuffled output.
tries=0
while true;
do
tries=$((tries+1))
echo "attempt number $tries"
# try a shuffle
bedtools shuffle -i foo.bed -g human.hg19.genome > foo.shuffled.bed
# a. sort the shuffle by chrom and start position so that we can use
# bedtools merge to test for overlaps
# b. the -n option in bedtools merge reports how many intervals in the
# original file are represented in the merged blocks. As such, if
# all of these counts are == 1, then there were no overlapping intervals
sort -k1,1 -k2,2n foo.shuffled.bed | bedtools merge -i - -n > foo.shuffled.merged.bed
# test to see if there were any overlapping intervals
has_overlaps=`awk '$4 > 1' foo.shuffled.merged.bed | wc -l`
# if there were not, we can quit and use foo.shuffled.merged.bed
if [ $has_overlaps == "0" ]
then break;
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment