Created
February 1, 2019 05:14
-
-
Save ashunigion/cc4527638d476cf9b480cd9cc9f69cf0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def reduce_to_k_dim(M, k=2): | |
""" Reduce a co-occurence count matrix of dimensionality (num_corpus_words, num_corpus_words) | |
to a matrix of dimensionality (num_corpus_words, k) using the following SVD function from Scikit-Learn: | |
- http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html | |
Params: | |
M (numpy matrix of shape (number of corpus words, number of corpus words)): co-occurence matrix of word counts | |
k (int): embedding size of each word after dimension reduction | |
Return: | |
M_reduced (numpy matrix of shape (number of corpus words, k)): matrix of k-dimensioal word embeddings. | |
In terms of the SVD from math class, this actually returns U * S | |
""" | |
n_iters = 10 # Use this parameter in your call to `TruncatedSVD` | |
M_reduced = None | |
print("Running Truncated SVD over %i words..." % (M.shape[0])) | |
#from sklearn.decomposition import TruncatedSVD | |
#from sklearn.random_projection import sparse_random_matrix | |
svd = TruncatedSVD(n_components=k, n_iter=n_iters, random_state=42) | |
svd.fit(M) | |
M_reduced = svd.transform(M) | |
print("Done.") | |
return M_reduced |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment