Last active
April 25, 2024 04:58
-
-
Save btahir/dab97ea384360999425707950f1ee2b0 to your computer and use it in GitHub Desktop.
MVP For Semantic Search using Sentence Transformers + FAISS
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install packages | |
# pip install faiss-cpu sentence-transformers | |
import numpy as np | |
import torch | |
import faiss | |
import time | |
from sentence_transformers import SentenceTransformer | |
# https://www.sbert.net/docs/pretrained_models.html#multi-qa-models | |
embedder = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1') | |
# Corpus with example sentences | |
corpus = ['A man is eating food.', | |
'A man is eating a piece of bread.', | |
'The girl is carrying a baby.', | |
'A man is riding a horse.', | |
'A woman is playing violin.', | |
'Two men pushed carts through the woods.', | |
'A man is riding a white horse on an enclosed ground.', | |
'A monkey is playing drums.', | |
'A cheetah is running behind its prey.' | |
] | |
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True) | |
# get embedding dimension | |
embed_dim = embedder.get_sentence_embedding_dimension() | |
# index on faiss | |
index = faiss.IndexIDMap(faiss.IndexFlatIP(embed_dim)) | |
index.add_with_ids(corpus_embeddings, np.array(range(0, len(corpus)))) | |
# save index and read it for future! | |
faiss.write_index(index, 'my_index') | |
index = faiss.read_index('my_index') | |
def search(query): | |
t=time.time() | |
query_vector = embedder.encode([query]) | |
k = 5 | |
top_k = index.search(query_vector, k) | |
print('totaltime: {}'.format(time.time()-t)) | |
return [corpus[_id] for _id in top_k[1].tolist()[0]] | |
# example query | |
query='music instrument' | |
results=search(query) | |
print('results :') | |
for result in results: | |
print('\t',result) | |
Works for me. 🤷♂️
I tried on Google Colab and shows that error. I installed faiss-gpu, maybe is another package?
Ok I saw you mention faiss-cpu on your gist and with that package works. Is there anyway to make it work using gpu?
I'm sure there are. You can google around for faiss-gpu examples.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I’m pretty new on ML, I tried this gist on colab and get this error on line 31
ValueError: input not a numpy array
any hints?