Skip to content

Instantly share code, notes, and snippets.

@jonathanoheix
Created December 18, 2018 09:50
Show Gist options
  • Select an option

  • Save jonathanoheix/1b4510220bb90cf2239f3e467c4334a2 to your computer and use it in GitHub Desktop.

Select an option

Save jonathanoheix/1b4510220bb90cf2239f3e467c4334a2 to your computer and use it in GitHub Desktop.
# create doc2vec vector columns
from gensim.test.utils import common_texts
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(reviews_df["review_clean"].apply(lambda x: x.split(" ")))]
# train a Doc2Vec model with our text data
model = Doc2Vec(documents, vector_size=5, window=2, min_count=1, workers=4)
# transform each document into a vector data
doc2vec_df = reviews_df["review_clean"].apply(lambda x: model.infer_vector(x.split(" "))).apply(pd.Series)
doc2vec_df.columns = ["doc2vec_vector_" + str(x) for x in doc2vec_df.columns]
reviews_df = pd.concat([reviews_df, doc2vec_df], axis=1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment