Skip to content

Instantly share code, notes, and snippets.

@pmagwene
Created March 29, 2017 14:08
Show Gist options
  • Save pmagwene/7110efa18679bd45ee0433001d20b741 to your computer and use it in GitHub Desktop.
Save pmagwene/7110efa18679bd45ee0433001d20b741 to your computer and use it in GitHub Desktop.
library(tidyr)
library(dplyr)
library(magrittr)
library(ggplot2)
# load data in "wide" format (genes in columns)
spellman <- read.csv("spellman-reformated.csv")
# restructure in "long" format
spellman.long <- gather(spellman, gene, expression, -expt, -time)
# group by gene, and calculate variance of expression for each gene
# removing any NA values
spellman.var <-
spellman.long %>%
group_by(gene) %>%
summarize(var = var(expression, na.rm=TRUE))
## make a quick histogram of the variances
ggplot(spellman.var, aes(var)) + geom_histogram()
# Let's find the cutotff to remove the bottom 1/3 least variable genes
# find the cutoff point corresponding to the 30% percentile of
# the variances
cutoff <- quantile(spellman.var$var, 0.33)
# Get genes whose variance is greater than cutoff
genes.of.interest <- spellman.var %>% filter(var >= cutoff) %$% gene
# the %$% operator comes from the magrittr package
# see https://github.com/tidyverse/magrittr
# create reduced "wide" data frame
spellman.reduced <-
spellman %>%
select(one_of(genes.of.interest))
# equivalent for our "long" data frame
spellman.reduced.long <-
spellman.long %>%
filter(gene %in% genes.of.interest)
# original dimension of spellman data
dim(spellman)
# dimension of
dim(spellman.reduced)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment