Skip to content

Instantly share code, notes, and snippets.

@mbk0asis
Last active June 29, 2023 09:19
Show Gist options
  • Save mbk0asis/d5bed7ea3b033be4edd5b48c572f98e2 to your computer and use it in GitHub Desktop.
Save mbk0asis/d5bed7ea3b033be4edd5b48c572f98e2 to your computer and use it in GitHub Desktop.
Find restriction enzyme sites in a genome
## Find restriction enzyme sites in a genome
library(Biostrings)
library(GenomicRanges)
# Read the genome sequence from a FASTA file
genome <- readDNAStringSet("/home/sc/00--NGS/ANNOTATION/Homo_sapiens_UCSC_GRCh38/Genome/genome.fa")
head(genome)
# enzyme sequence
enzyme_site <- DNAString("GGGCCC")
# Find the positions of the enzyme site in the genome
enzyme_positions <- vmatchPattern(enzyme_site, genome)
print(enzyme_positions)
# Create a GRanges object
starts <- unlist(start(enzyme_positions))
ends <- unlist(end(enzyme_positions))
enzyme_granges <- GRanges(seqnames=names(starts), ranges=IRanges(start=starts, end=ends))
head(enzyme_granges,30)
# Convert the GRanges object to BED format and save
library(rtracklayer)
enzyme_bed <- export(enzyme_granges, format = "BED")
write.table(enzyme_bed, "enzymes.bed", row.names = FALSE, quote = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment