How can I get the embedding of a document in langchain?

232 Views Asked by At

I use the langchain Python lib to create a vector store and retrieve relevant documents given a user query. How can I get the embedding of a document in the vector store?

E.g., in this code:

import pprint
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document

model = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embeddings = HuggingFaceEmbeddings(model_name = model)

def main():
    doc1 = Document(page_content="The sky is blue.",    metadata={"document_id": "10"})
    doc2 = Document(page_content="The forest is green", metadata={"document_id": "62"})
    docs = []
    docs.append(doc1)
    docs.append(doc2)

    for doc in docs:
        doc.metadata['summary'] = 'hello'

    pprint.pprint(docs)
    db = FAISS.from_documents(docs, embeddings)
    db.save_local("faiss_index")
    new_db = FAISS.load_local("faiss_index", embeddings)

    query = "Which color is the sky?"
    docs = new_db.similarity_search_with_score(query)
    print('Retrieved docs:', docs)
    print('Metadata of the most relevant document:', docs[0][0].metadata)

if __name__ == '__main__':
    main()

How can I get the embedding of documents doc1 and doc2?

The code was tested with Python 3.11 with:

pip install langchain==0.1.1 langchain_openai==0.0.2.post1 sentence-transformers==2.2.2 langchain_community==0.0.13 faiss-cpu==1.7.4
1

There are 1 best solutions below

2
the beetle On

One could recompute the embeddings:

emb1 = embeddings.embed_query(doc1.page_content)
emb2 = embeddings.embed_query(doc2.page_content)