This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for embeddings in docs_embeddings[0]: | |
print() | |
print(embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docs_sequences = [] | |
for docs_list in [bing_search_results, google_search_results]: | |
docs_sequences.append(tokeniser.texts_to_sequences(docs_list)) | |
docs_embeddings = [] | |
for docs_set in docs_sequences: | |
this_docs_set = [] | |
for doc in docs_set: | |
this_doc_embeddings = np.array([embeddings[idx] for idx in doc]) | |
this_docs_set.append(this_doc_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
print(query_embeddings.shape) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_embeddings = np.row_stack([query_1_embeddings, query_2_embeddings_avg]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_2_embeddings_avg = tf.reduce_mean(query_2_embeddings, axis=1, keepdims=True).numpy() | |
print(query_2_embeddings_avg) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_2_embedding_indices = tokeniser.texts_to_sequences([query_2]) | |
query_2_embeddings = np.array([embeddings[x] for x in query_2_embedding_indices]) | |
print(query_2_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_1_embedding_index = tokeniser.texts_to_sequences([query_1]) | |
query_1_embeddings = np.array([embeddings[x] for x in query_1_embedding_index]) | |
print(query_1_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EMBEDDING_DIMS = 2 | |
embeddings = np.random.randn(vocab_size, EMBEDDING_DIMS).astype(np.float32) | |
print(embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for idx, word in tokeniser.index_word.items(): | |
print(f"index {idx} - {word}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
combined_texts = [query_1, *bing_search_results, query_2, *google_search_results] | |
tokeniser = tf.keras.preprocessing.text.Tokenizer() | |
tokeniser.fit_on_texts(combined_texts) | |
# we add one here to account for the padding word | |
vocab_size = max(tokeniser.index_word) + 1 | |
print(vocab_size) |