I use the langchain Python lib to create a vector store and retrieve relevant documents given a user query. How can I get the embedding of a document in the vector store?
E.g., in this code:
import pprint
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
model = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embeddings = HuggingFaceEmbeddings(model_name = model)
def main():
doc1 = Document(page_content="The sky is blue.", metadata={"document_id": "10"})
doc2 = Document(page_content="The forest is green", metadata={"document_id": "62"})
docs = []
docs.append(doc1)
docs.append(doc2)
for doc in docs:
doc.metadata['summary'] = 'hello'
pprint.pprint(docs)
db = FAISS.from_documents(docs, embeddings)
db.save_local("faiss_index")
new_db = FAISS.load_local("faiss_index", embeddings)
query = "Which color is the sky?"
docs = new_db.similarity_search_with_score(query)
print('Retrieved docs:', docs)
print('Metadata of the most relevant document:', docs[0][0].metadata)
if __name__ == '__main__':
main()
How can I get the embedding of documents doc1 and doc2?
The code was tested with Python 3.11 with:
pip install langchain==0.1.1 langchain_openai==0.0.2.post1 sentence-transformers==2.2.2 langchain_community==0.0.13 faiss-cpu==1.7.4
One could recompute the embeddings: