Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Created April 27, 2021 18:37
Show Gist options
  • Select an option

  • Save tomsing1/6230f2f5fb3c4bf0477276c35f97fd2d to your computer and use it in GitHub Desktop.

Select an option

Save tomsing1/6230f2f5fb3c4bf0477276c35f97fd2d to your computer and use it in GitHub Desktop.
How to create BED files for use with RSEQC
  1. Download the following two BED files from the RSEQC website, hosted on SourceForge:
  • hg38_RefSeq.bed.gz: All human RefSeq transcripts
  • hg38.HouseKeepingGenes.bed.gz: Subset of human RefSeq transcripts considered housekeeping genes
  1. Extract the RefSeq identifiers from the HouseKeepingGenes file.
  2. Map the RefSeq identifiers to orthologous genes in your species of interest, e.g. using ensembl's BioMart web interface. Make sure the return the gene stable ID for the target species.
  3. Download the GTF file with gene annotations for your species of interest, e.g. from ensembls FTP server
  4. Next, we need to subset the GTF file to the housekeeping gene identifiers we obtained by mapping the human RefSeq identifiers, e.g. using rtracklayer BioC package.
library(purrr)
library(readr)
library(rtracklayer)

# read biomart output files
housekeeping <- purrr::map_df(
    dir("~/Downloads/", pattern = "mart_export*", full.names = TRUE), 
    readr::read_tsv)
housekeeping <- unique(housekeeping$`Crab-eating macaque gene stable ID`)

# read ensembl GTF file
gtf <- import("~/tmp/Macaca_fascicularis.Macaca_fascicularis_6.0.103.gtf.gz")

# subset the `GRanges` object; also exclude gene rows
hk <- gtf[gtf$gene_id %in% housekeeping & gtf$type != "gene"]

# export the filtered GTF file
export(hk, con = "~/tmp/housekeeping.gtf", format = "gtf")
  1. Convert the GTF files to BED format

Source

Convert the original GTF file:

conda create -n gtf
conda activate gtf
mamba install -c bioconda ucsc-gtfToGenePred ucsc-genePredToBed
gzip -cd ~/tmp/Macaca_fascicularis.Macaca_fascicularis_6.0.103.gtf.gz | \
gtfToGenePred /dev/stdin ~/tmp/all_transcripts.genepred
genePredToBed ~/tmp/all_transcripts.genepred ~/tmp/all_transcripts.bed

Convert the housekeeping GTF file:

gtfToGenePred ~/tmp/housekeeping.gtf ~/tmp/housekeeping.genepred
genePredToBed ~/tmp/housekeeping.genepred ~/tmp/housekeeping.bed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment