A simple tutorial of how to use the python implementation of tsne and Stochastic Outlier Selection

tSNE

Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c

In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:

python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv

Where 30 is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.

SOS

Please refer to https://github.com/jeroenjanssens/scikit-sos and pip install scikit-sos

For SOS, please write the transposed matrix of filtered variance stabilized values to a text file (again without row and column labels). For example, in the tSNE example above, if you have a matrix with 40 samples filtered for the top 500 varying genes, the resulting text file will have 500 rows and 40 columns. For SOS, transpose it such that there are 500 columns and 40 rows.

Execute the SOS algorithm:

/path/to/vsd.csv | sos -p 30 > /path/to/output.txt

The -p flag corresponds to the perplexity, again, this should be adjusted based on sample size. Each row of the output corresponds to the columns (in same order) of the vsd input. The output is the outlier probability value.

ipurusho/tsne_and_sos.md

tSNE

SOS