Download the following script: https://gist.github.com/ipurusho/44e06d43aab0a7dd2641589a4fd3351c
In R, write the variance stabilized values per sample, subsetting for the top 500 variable genes, to a file #without# row and column labels. You can then use the tsne.py script as follows:
python /path/to/tSNE.py /path/to/tsne_input_vsd.csv 30 /path/to/output_file.csv
Where 30
is the perplexity value, which is dependent on sample size for optimum output (see documentation). The output file will contain two columns, dimension 1, dimension 2. The rows correspond to the Samples which will be in the same order as the input vsd columns. You can load this data back into R and visualize it using your preferred method.
Please refer to https://github.com/jeroenjanssens/scikit-sos and pip install scikit-sos
For SOS, please write the transposed matrix of filtered variance stabilized values to a text file (again without row and column labels). For example, in the tSNE example above, if you have a matrix with 40 samples filtered for the top 500 varying genes, the resulting text file will have 500 rows and 40 columns. For SOS, transpose it such that there are 500 columns and 40 rows.
Execute the SOS algorithm:
/path/to/vsd.csv | sos -p 30 > /path/to/output.txt
The -p
flag corresponds to the perplexity, again, this should be adjusted based on sample size. Each row of the output corresponds to the columns (in same order) of the vsd input. The output is the outlier probability value.