Here is my code:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
openai_api_key= 'xxx'
loader = PyPDFLoader('xxx.pdf')
text = loader.load()
chunk_size = 200
chunk_overlap = 50
# Split the pdf
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
separators=['.']
)
doc = splitter.split_documents(text)
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key)
persist_directory = "embedding/chroma"
vectordb = Chroma(
persist_directory = persist_directory,
embedding_function = embedding)
vectordb.persist()
Here is the first part of my code, then i tested if data in vectordb.get()['documents'], yes it is.
Then i retrieve the db,
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
openai_api_key= 'xxx'
vectordb2 = Chroma(persist_directory=persist_directory, embedding_function=embedding)
vectordb2.get()['documents']
Yes the data is in vectordb2, everything seems alright.
Then i try to use RetrievalQA with OpenAI model,
retriever = vectordb2.as_retriever() # search_kwargs={"k": 4}
qa = RetrievalQA.from_chain_type(ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=openai_api_key),
chain_type="stuff",
retriever=retriever)
It throws me this error:
File ~/miniforge3/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for RetrievalQA
retriever
instance of BaseRetriever expected (type=type_error.arbitrary_type; expected_arbitrary_type=BaseRetriever)
I googled, and it seems there are some other kinds of "1 validation error", but they are different. e.g. ValidationError: 1 validation error for RetrievalQA retriever value is not a valid dict (type=type_error.dict)
Would anyone please help? any help is appreciated.