Skip to content

Instantly share code, notes, and snippets.

@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.