I have this Langchain code for my own dataset:
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
docs, embedding=OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
)
retriever = vectorstore.as_retriever()
and I want to add semantic chunking for the dataset (docs) before (or after if possible) I save them to the vector store. Specifically, I have been trying to add the following snippet before the previous code:
from langchain_experimental.text_splitter import SemanticChunker
text_splitter = SemanticChunker(OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY))
docs = text_splitter.create_documents(docs)
to convert docs into chunked format but it doesn't work possibly because the structure is different.
Has anyone tried and succeeded in this before?
Try
which expects a
[list]. Reference > https://python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker