-
-
Save ag1805x/89cbc34ca9b20b877a3f070455289906 to your computer and use it in GitHub Desktop.
library(dendextend) | |
countsPC <- read.table("CountsPC_Gene_MultimapOverlap.tsv", header=T, row.names=c(1)) | |
factors <- as.data.frame(read.table("factors.txt", sep="\t", header = TRUE, row.names=c(1))) | |
dist <- dist(t(countsPC), method="euclidean") | |
cluster <- hclust(dist) | |
# main, sub, xlab, ylab | |
dend <- as.dendrogram(cluster) | |
#Add colors to label | |
label_colors_to_use <- as.numeric(factors$SampleType) #Input groups of sample type for label color | |
label_colors_to_use <- label_colors_to_use[order.dendrogram(dend)] #Order group annotation as per in dendrogram | |
#Use to add custom label color | |
for(i in 1:length(label_colors_to_use)){ | |
if(label_colors_to_use[i] == 1){ | |
label_colors_to_use[i] = "blue" | |
}else if(label_colors_to_use[i] == 2){ | |
label_colors_to_use[i] = "red" | |
} | |
} | |
#Add colored bars showing study accession | |
StudyAccession_colors_to_use <- as.numeric(factors$StudyAccession) #Input groups of sample type for label color | |
StudyAccession_colors_to_use <- StudyAccession_colors_to_use[order.dendrogram(dend)] #Order group annotation as per in dendrogram | |
dend <- set(dend, "labels_colors", label_colors_to_use) #Apply colors to labels | |
dend <- hang.dendrogram(dend, hang=-1) #Leaves at same height | |
dend <- set(dend, "branches_lwd", 2) #Thick branches | |
dend <- set(dend, "labels_cex", 1) #label font size | |
tiff(filename="Sample_hclust.tiff", height=10, width=20, units='in', res=300) | |
par(mar = c(10,5,3,1)) | |
plot(dend, main = "Sample Clustering\ndist.method=Eucleadian", ylab = "Height") #Plot the dendrogram | |
colored_bars(colors = StudyAccession_colors_to_use, dend = dend, rowLabels = "Project", sort_by_labels_order = FALSE) | |
dev.off() | |
-
The
factors.txt
file contains sample information. Each row for each sample. Columns for covariate/condition (eg. test or control)etc. As you mentioned, the factors is used for colours here. The same factors file can be used in other steps of analysis including DE testing. -
This is an issue even I have observed often. May be by reading the original paper more carefully about how they created the plots will help. You can also try the process you mentioned to consider top expressed genes or most variable genes. Instead of using all genes it may be better to filter genes with low/zero counts. eg.
rowSums(cpm(counts_pc)>1) >1
Thanks for a quick reply, I will try that. for 2) not sure yet how to use that "rowSums", but will search and come back if I'm really stuck.
Just use this on your raw count matrix:
keep <- rowSums(cpm(counts)>1) >1
counts_filt <- counts[keep, ]
After this continue as is with the filtered counts.
Hi,
thanks for sharing your code.
I just tried it (just the main lines) and obtained a dendogram at the end as expected. I have 15 RNA-seq from repository 3 duplicates and 3 triplicates and I made a file of the featureCounts output with first colum gene names, and all other columns the 15 RNA-seq counts.
I'd like to ask a few questions if I may:
thanks a lot or any help,
best regards