Skip to content

Instantly share code, notes, and snippets.

@Eligijus112
Last active October 31, 2020 16:57
Show Gist options
  • Save Eligijus112/4ab2f1858540699f79fd69900fc36c25 to your computer and use it in GitHub Desktop.
Save Eligijus112/4ab2f1858540699f79fd69900fc36c25 to your computer and use it in GitHub Desktop.
An example of how to create X and Y for word embedding training
from scipy import sparse
import numpy as np
# Defining the number of features (unique words)
n_words = len(unique_word_dict)
# Getting all the unique words
words = list(unique_word_dict.keys())
# Creating the X and Y matrices using one hot encoding
X = []
Y = []
for i, word_list in tqdm(enumerate(word_lists)):
# Getting the indices
main_word_index = unique_word_dict.get(word_list[0])
context_word_index = unique_word_dict.get(word_list[1])
# Creating the placeholders
X_row = np.zeros(n_words)
Y_row = np.zeros(n_words)
# One hot encoding the main word
X_row[main_word_index] = 1
# One hot encoding the Y matrix words
Y_row[context_word_index] = 1
# Appending to the main matrices
X.append(X_row)
Y.append(Y_row)
# Converting the matrices into an array
X = np.asarray(X)
Y = np.asarray(Y)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment