This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| for embeddings in docs_embeddings[0]: | |
| print() | |
| print(embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| docs_sequences = [] | |
| for docs_list in [bing_search_results, google_search_results]: | |
| docs_sequences.append(tokeniser.texts_to_sequences(docs_list)) | |
| docs_embeddings = [] | |
| for docs_set in docs_sequences: | |
| this_docs_set = [] | |
| for doc in docs_set: | |
| this_doc_embeddings = np.array([embeddings[idx] for idx in doc]) | |
| this_docs_set.append(this_doc_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| print(query_embeddings.shape) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| query_embeddings = np.row_stack([query_1_embeddings, query_2_embeddings_avg]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| query_2_embeddings_avg = tf.reduce_mean(query_2_embeddings, axis=1, keepdims=True).numpy() | |
| print(query_2_embeddings_avg) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| query_2_embedding_indices = tokeniser.texts_to_sequences([query_2]) | |
| query_2_embeddings = np.array([embeddings[x] for x in query_2_embedding_indices]) | |
| print(query_2_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| query_1_embedding_index = tokeniser.texts_to_sequences([query_1]) | |
| query_1_embeddings = np.array([embeddings[x] for x in query_1_embedding_index]) | |
| print(query_1_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| EMBEDDING_DIMS = 2 | |
| embeddings = np.random.randn(vocab_size, EMBEDDING_DIMS).astype(np.float32) | |
| print(embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| for idx, word in tokeniser.index_word.items(): | |
| print(f"index {idx} - {word}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| combined_texts = [query_1, *bing_search_results, query_2, *google_search_results] | |
| tokeniser = tf.keras.preprocessing.text.Tokenizer() | |
| tokeniser.fit_on_texts(combined_texts) | |
| # we add one here to account for the padding word | |
| vocab_size = max(tokeniser.index_word) + 1 | |
| print(vocab_size) |