This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sum(true_prob_dist * np.log(true_prob_dist / predicted_prob_dist)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sum(true_prob_dist * np.log(true_prob_dist / true_prob_dist)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_1 = "dog" | |
bing_search_results = [ | |
"Dog - Wikipedia", | |
"Adopting a dog or puppy | RSPCA Australia", | |
"dog | History, Domestication, Physical Traits, & Breeds", | |
"New South Wales | Dogs & Puppies | Gumtree Australia Free", | |
"dog - Wiktionary" | |
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
relevance_grades = tf.constant([ | |
[3.0, 2.0, 2.0, 2.0, 1.0], | |
[3.0, 3.0, 1.0, 1.0, 0.0] | |
]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
combined_texts = [query_1, *bing_search_results, query_2, *google_search_results] | |
tokeniser = tf.keras.preprocessing.text.Tokenizer() | |
tokeniser.fit_on_texts(combined_texts) | |
# we add one here to account for the padding word | |
vocab_size = max(tokeniser.index_word) + 1 | |
print(vocab_size) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for idx, word in tokeniser.index_word.items(): | |
print(f"index {idx} - {word}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EMBEDDING_DIMS = 2 | |
embeddings = np.random.randn(vocab_size, EMBEDDING_DIMS).astype(np.float32) | |
print(embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_1_embedding_index = tokeniser.texts_to_sequences([query_1]) | |
query_1_embeddings = np.array([embeddings[x] for x in query_1_embedding_index]) | |
print(query_1_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_2_embedding_indices = tokeniser.texts_to_sequences([query_2]) | |
query_2_embeddings = np.array([embeddings[x] for x in query_2_embedding_indices]) | |
print(query_2_embeddings) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
query_2_embeddings_avg = tf.reduce_mean(query_2_embeddings, axis=1, keepdims=True).numpy() | |
print(query_2_embeddings_avg) |