Skip to content

Instantly share code, notes, and snippets.

View mdozmorov's full-sized avatar

Mikhail Dozmorov mdozmorov

View GitHub Profile
mdozmorov / gist_mm39_excluderanges.R
Created September 19, 2022 23:39
mm39 excluderanges download
# Download a list of problematic regions (aka blacklist) for the GRCm39/mm39
# mouse genome assembly. Defined by the Boyle-Lab/Blacklist
# software, High Signal and Low Mappability regions.
# See for more information.
suppressMessages(library(httr)) #
suppressMessages(library(GenomicRanges)) #
# bedbase_id
bedbase_id <- "edc716833d4b5ee75c34a0692fc353d5"
# Construct output file name
mdozmorov / gist_T2T_excluderanges.R
Last active September 19, 2022 19:59
T2T excluderanges download
# Download a list of problematic regions (aka blacklist) for the T2T-CHM13
# telomere-to-telomere human genome assembly. Defined by the Boyle-Lab/Blacklist
# software, High Signal and Low Mappability regions.
# See for more information.
suppressMessages(library(httr)) #
suppressMessages(library(GenomicRanges)) #
# bedbase_id
bedbase_id <- "6548a002754cc1e882035293541b59a8"
# Construct output file name
mdozmorov / liftOver.R
Created November 27, 2020 01:33
How to liftOver Paired BED data
# How to liftOver Paired BED data
# Landscape of Cohesin-Mediated Chromatin Loops in the Human Genome
# Supplementary Table 4 | Pan-cell type cohesin-mediated chromatin loops, hg19 coordinates, paired BED data
url1 <- ""
# Tesseract Intro:
# Image source:
eng <- tesseract("eng")
text <- tesseract::ocr("", engine = eng)
# Exploratory data analysis of SCSig collection: Signatures of Single Cell Identities
# Read in gene sets
mtx <- read.gmt("")
# Number of unique gene signatures
mtx$ont %>% unique() %>% length()
# Summary statistics on size of gene signatures
mtx %>% group_by(ont) %>% summarise(size = n()) %>% select(size) %>% summary()
mdozmorov /
Created May 18, 2019 01:55
Check Phred offset
zcat $FILE | head -n 40 | awk '{if(NR%4==0) printf("%s",$0);}' | od -A n -t u1 | awk 'BEGIN{min=100;max=0;}{for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}}END{if(max<=74 && min<59) print "Phred+33"; else if(max>73 && min>=64) print "Phred+64"; else if(min>=59 && min<64 && max>73) print "Solexa+64"; else print "Unknown score encoding";}'
# PCA: Check for batch effects. Select one batch, to color points by its assignment
pca <- mtx %>% varFilter(., var.cutoff = 0.75) %>% scale %>% t %>% prcomp
colorby <- "Group" # covariates[2]
pt <- ggplot(data = data.frame(pca$x, annot),
aes(x = as.numeric(PC1), y = as.numeric(PC2), label = Sample)) +
theme(plot.title = element_text(lineheight = 0.8, face="bold")) +
ggtitle(paste("PCA with batch, coloring by ", colorby)) +
geom_point(aes(color = eval(parse(text = colorby))), size = 3) +
geom_text_repel(colour = "black", size = 3) +
rank name country category sales profits assets marketvalue
1 Citigroup United States Banking 94.71 17.85 1264.03 255.3
2 General Electric United States Conglomerates 134.19 15.59 626.93 328.54
3 American Intl Group United States Insurance 76.66 6.46 647.66 194.87
4 ExxonMobil United States Oil & gas operations 222.88 20.96 166.99 277.02
5 BP United Kingdom Oil & gas operations 232.57 10.27 177.57 173.54
6 Bank of America United States Banking 49.01 10.81 736.45 117.55
7 HSBC Group United Kingdom Banking 44.33 6.66 757.6 177.96
8 Toyota Motor Japan Consumer durables 135.82 7.99 171.71 115.4
9 Fannie Mae United States Diversified financials 53.13 6.48 1019.17 76.84
mdozmorov / DEG_RNA-seq.R
Created February 2, 2017 02:01
Differential expression analysis in RNA-seq
# Source: Additional file 1 from Łabaj, Paweł P., and David P. Kreil. “Sensitivity, Specificity, and Reproducibility of RNA-Seq Differential Expression Calls.” Biology Direct 11, no. 1 (December 20, 2016): 66. doi:10.1186/s13062-016-0169-7.
DEfun <- function(counts, design) {
DE <- list()
## limma
gene.dge <- DGEList(counts = counts, group = factor(rep(1:2, each = 4)))
gene.dge.norm <- calcNormFactors(gene.dge)
gene.dge.norm2 <- gene.dge.norm
gene.dge.norm2$counts <- gene.dge.norm2$counts - 0.5