is there any way to retrieve the embeddings store in a langchain VectorStore?

264 Views Asked by At

I'm using Langchain to load a document, split it into chunks, embed those chunks, embed them and then store the embedding vectors into a langchain VectorStore database. My use case requires me to run an algorithm on the embedding vectors, which i have been trying to find a way to fetch but to no avail.

My idea is to be able to do something like this:

from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import SomeVectorStore
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../document.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = SomeVectoreStore.from_documents(docs, embeddings)

# get all the embeddings and their corresponding chunks from the db
embeddings_and_thei_chunks = db.some_way_to_get_all_embeddings()
1

There are 1 best solutions below

0
msalam On

The exact method to retrieve embeddings from a VectorStore would depend on the specific implementation of the VectorStore you're using. However, most vector stores should provide a way to iterate over the stored vectors. Assuming SomeVectorStore has a method items() that returns an iterator over (key, value) pairs, where key is the chunk and value is the corresponding embedding, you could do something like this:

# get all the embeddings and their corresponding chunks from the db
embeddings_and_their_chunks = list(db.items())

If SomeVectorStore does not provide such a method, you would need to check the documentation or the source code of the VectorStore to find out how to retrieve the stored vectors.

If there's no built-in way to retrieve all vectors, you might need to keep track of the keys (i.e., the chunks) that you're storing in the VectorStore, and then use those keys to retrieve the vectors later. For example:

# when storing the vectors
keys = []
for doc in docs:
    key = db.store(doc.embedding)
    keys.append(key)

# later, to retrieve the vectors
embeddings_and_their_chunks = [(key, db.get(key)) for key in keys]

Again, the exact details would depend on the specific VectorStore you're using.