Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Created October 2, 2025 15:35
Show Gist options
  • Select an option

  • Save tomsing1/159d047fa75e16f1bc8308c1e121a420 to your computer and use it in GitHub Desktop.

Select an option

Save tomsing1/159d047fa75e16f1bc8308c1e121a420 to your computer and use it in GitHub Desktop.
Example of how random data can yield nominally signficant results & a useless classifier
library(genefilter)
library(gplots)
set.seed(123L)
n_subjects = 20
n_features = 1e4
group = factor(sample(c("A", "B"), size = n_subjects, replace = TRUE))
data = rnorm(n_subjects*n_features)
m <- matrix(data, ncol = n_subjects)
row.names(m) <- paste0("Gene", seq.int(nrow(m)))
stats <- genefilter::rowttests(x = m, fac = group)
stats <- stats[order(stats$p.value), ]
nominally_signif <- row.names(stats)[with(stats, p.value < 0.05 & abs(dm) > 1)]
length(nominally_signif)
top_features = row.names(head(stats[nominally_signif, ], 100))
z_scores = t(scale(t(m)))
heatmap.2(z_scores[top_features, ], trace = "none", scale = "none",
labCol = group, labRow = NA,
col = colorRampPalette(c("navy", "white", "firebrick"))(100),
breaks = seq(-2, 2, length.out = 101), dendrogram = "both",
hclustfun = \(x) hclust(x, method = "ward.D2"),
density.info = "none", colsep = sum(group == "B"))
pcs <- prcomp(z_scores[top_features, ])
plot(pcs$rotation, col = c("orange", "darkgray")[group], pch = 19, cex = 2,
main = "Principal Component Analysis")
mtext("Too 100 features only", side = 3, line = 0.5)
legend("topright", bty="o", legend = levels(group), pch = 19,
col = c("orange", "darkgray"), cex = 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment