Prerequisite: gensim
.
Two scripts under src/
that you need to look at:
-
generate_embeddings.py
: Creates dataloaders for embedded sentences using fasttext model trained on CUB dictionary. Thetrain_loader
andtest_loader
will returndataB
of length 2:dataB[0]
: [batch_size, sentence_length, embedding_vector_size]dataB[1]
: [batch_size], original sentence length before truncation or padding (you can probably ignore this one, but I kept it there just in case you need the original length to truncate the sentence when calculating correlations) -
coherence.py
: this one is pretty much ready to go, it is defaulted to load the trained cub model underexpeirments/ft_obj
. You just need to import theCCA
module (see usage in line 65, 66 and 77) that can be called with: