Created
April 7, 2014 13:21
-
-
Save blahah/10020181 to your computer and use it in GitHub Desktop.
extract gene counts from transcript ID/count
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# convert transcript ID to gene ID | |
df <- data.frame(a = c("AT1G2.1", "AT1G2.3", "AT1G2.3", "AT1G3.1", "AT1G3.3", "AT1G3.3"), b=1:6) | |
df$t <- gsub(df$a, pattern="\\.[0-9]+", replacement="") | |
# sum by gene ID | |
aggregate(df[,2], by=list(as.factor(df$t)), sum) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
NOTE: Please don't ever just sum transcript counts to get gene counts in a real analysis. This code is just to demonstrate
gsub
andaggregate
to a colleague.