###Download STAR### Obtain STAR source from https://github.com/alexdobin/STAR
Add the following to your .bashrc file and source it:
export PATH=/path/to/STAR/bin/:$PATH
###Generate Reference Genome
| import sys | |
| import os | |
| import re | |
| import glob | |
| #sys.argv[1] = action | |
| #sys.argv[2] = top level folder | |
| #sys.argv[3] = read | |
| def subdirectories(args): |
###Download STAR### Obtain STAR source from https://github.com/alexdobin/STAR
Add the following to your .bashrc file and source it:
export PATH=/path/to/STAR/bin/:$PATH
###Generate Reference Genome
In order to perform a pairwise splicing analysis, sorted bam files per sample are needed as well as reference annotation in the form of a GTF file
###Download rMATS Download rMATS at http://rnaseq-mats.sourceforge.net/download.html
###Running rMATS Via python, you can run rMATs using the following example command:
python /path/to/rMATS/directory/RNASeq-MATS.py -b1 conditon1_replicate_1.bam,condition_1_replicate_2.bam -b2 condition_2_replicate_1.bam,condition_2_replicate_2.bam -gtf /path/to/gtf/file -o /path/to/results/directory -t single -len {read length}
| top.var<-function (x, ntop = 500) | |
| { | |
| require("genefilter") | |
| rv = rowVars(x) | |
| select = order(rv, decreasing = TRUE)[seq_len(ntop)] | |
| topvar = x[select, ] | |
| return(topvar) | |
| } |
###Download R scripts
Download the following R script which contains necessary functions: https://gist.github.com/ipurusho/64d6dbf1a32aa7c83f665785155e494b
Then, load the covariate_contribution.R script into your R environment.
###Performing Covariate Analysis
Once you have obtained a variance stabilized count matrix and organized your meta data (meta data rows should correspond to VST columns), you can use the top.var function (now in your R environment) to filter for the top 500 (or more) variable genes for input into PCA.
###Download modified RRHO scripts Obtain the modified R scripts from https://gist.github.com/ipurusho/8e4beef4fb51c2485c9f and source it in your R environment.
###Prepare the data Create two separate matrices, one for P value and one for log fold change from your differential analysis. The rows should be genes and the columns should be the comparisons (column order should be consistent between the two matrices). For example:
P.vals = as.data.frame(cbind(Comparison1 = Comparison1[intersect.genes,"P.Value"],
Comparison2 = Comparison2[intersect.genes,"P.Value"],
Comparison3 = Comparison3[intersect.genes,"P.Value"]))
| ### "transcript" parameter is the original matrix of normalized counts | |
| ### "signifcant" parameter is the filtered matrix of significant genes (using desired cutoff) | |
| ### "email" is an email address needed to run the web service | |
| go.david<-function(transcript,significant,email){ | |
| require("RDAVIDWebService") | |
| if(dim(significant)[1] !=0){ | |
| david<-DAVIDWebService$new(email=email) | |
| background<-addList(david, rownames(transcript), idType="ENSEMBL_GENE_ID",listName="full_transcript", listType="Background") | |
| sig.genes<-addList(david, rownames(significant), idType="ENSEMBL_GENE_ID",listName="significant", listType="Gene") |
| import sys | |
| import numpy as np | |
| from sklearn.manifold import TSNE | |
| my_data = np.genfromtxt(str(sys.argv[1]), delimiter=',') | |
| model = TSNE(n_components=2, random_state=0,perplexity=int(sys.argv[2])) | |
| tsne = model.fit_transform(np.transpose(my_data)) | |
| np.savetxt(str(sys.argv[3]), tsne, delimiter=",") |
Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c
In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:
python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv
Where 30 is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.