How to create BED files for use with RSEQC

Download the following two BED files from the RSEQC website, hosted on SourceForge:

hg38_RefSeq.bed.gz: All human RefSeq transcripts
hg38.HouseKeepingGenes.bed.gz: Subset of human RefSeq transcripts considered housekeeping genes

Extract the RefSeq identifiers from the HouseKeepingGenes file.
Map the RefSeq identifiers to orthologous genes in your species of interest, e.g. using ensembl's BioMart web interface. Make sure the return the gene stable ID for the target species.
Download the GTF file with gene annotations for your species of interest, e.g. from ensembls FTP server
Next, we need to subset the GTF file to the housekeeping gene identifiers we obtained by mapping the human RefSeq identifiers, e.g. using rtracklayer BioC package.

library(purrr)
library(readr)
library(rtracklayer)

# read biomart output files
housekeeping <- purrr::map_df(
    dir("~/Downloads/", pattern = "mart_export*", full.names = TRUE), 
    readr::read_tsv)
housekeeping <- unique(housekeeping$`Crab-eating macaque gene stable ID`)

# read ensembl GTF file
gtf <- import("~/tmp/Macaca_fascicularis.Macaca_fascicularis_6.0.103.gtf.gz")

# subset the `GRanges` object; also exclude gene rows
hk <- gtf[gtf$gene_id %in% housekeeping & gtf$type != "gene"]

# export the filtered GTF file
export(hk, con = "~/tmp/housekeeping.gtf", format = "gtf")

Convert the GTF files to BED format

Source

Convert the original GTF file:

conda create -n gtf
conda activate gtf
mamba install -c bioconda ucsc-gtfToGenePred ucsc-genePredToBed

gzip -cd ~/tmp/Macaca_fascicularis.Macaca_fascicularis_6.0.103.gtf.gz | \
gtfToGenePred /dev/stdin ~/tmp/all_transcripts.genepred
genePredToBed ~/tmp/all_transcripts.genepred ~/tmp/all_transcripts.bed

Convert the housekeeping GTF file:

gtfToGenePred ~/tmp/housekeeping.gtf ~/tmp/housekeeping.genepred
genePredToBed ~/tmp/housekeeping.genepred ~/tmp/housekeeping.bed

tomsing1/seqrc_bed_file_generation.md

Select an option

No results found

Select an option

No results found