Prerequisite: gensim.
Two scripts under src/ that you need to look at:
-
generate_embeddings.py: Creates dataloaders for embedded sentences using fasttext model trained on CUB dictionary. Thetrain_loaderandtest_loaderwill returndataBof length 2:dataB[0]: [batch_size, sentence_length, embedding_vector_size]dataB[1]: [batch_size], original sentence length before truncation or padding (you can probably ignore this one, but I kept it there just in case you need the original length to truncate the sentence when calculating correlations) -
coherence.py: this one is pretty much ready to go, it is defaulted to load the trained cub model underexpeirments/ft_obj. You just need to import theCCAmodule (see usage in line 65, 66 and 77) that can be called with: