Skip to content

Instantly share code, notes, and snippets.

@Tabea-K
Created November 8, 2016 10:29
Show Gist options
  • Select an option

  • Save Tabea-K/53d1145db14d0f720a73591dd5f566bd to your computer and use it in GitHub Desktop.

Select an option

Save Tabea-K/53d1145db14d0f720a73591dd5f566bd to your computer and use it in GitHub Desktop.
Get the 5'UTR regions from a UCSC RefSeq file. Or any other gene file in similar format from UCSC. It skips all lines of genes that do not contain any CDs. It prints out a BED file
awk '{if ($5!=$7 && $6 != $8){print $0}}' data/hg38_refGene.txt | awk '{if ($4=="+"){print $3"\t"$5"\t"$7"\t"$2"\t"0"\t"$4} else if ($4=="-"){print $3"\t"$8"\t"$6"\t"$2"\t"0"\t"$4}}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment