This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def smart_procrustes_align_gensim(base_embed, other_embed, words=None): | |
""" | |
Original script: https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf | |
Procrustes align two gensim word2vec models (to allow for comparison between same word across models). | |
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. | |
First, intersect the vocabularies (see `intersection_align_gensim` documentation). | |
Then do the alignment on the other_embed model. | |
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version. | |
Return other_embed. |