I am trying to build a pdf chat bot where you upload a pdf and ask questions related to you pdf. For this, I was thinking of a RAG based application . So i wanted to create vector embeddings of my input pdf but when i do this,
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index_creator = VectorstoreIndexCreator(
vectorstore_cls = Cassandra,
embedding = embed_model,
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 400,
chunk_overlap = 30
),
vectorstore_kwargs={
'session': session,
'keyspace': keyspace,
'table_name': table_name
}
)
I am getting validation error.
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-17-b83dc7fd1587> in <cell line: 4>()
2 keyspace = "pdf_qa_name"
3
----> 4 index_creator = VectorstoreIndexCreator(
5 vectorstore_cls = Cassandra,
6 embedding = embed_model,
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in __init__(__pydantic_self__, **data)
339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
340 if validation_error:
--> 341 raise validation_error
342 try:
343 object_setattr(__pydantic_self__, '__dict__', values)
ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
instance of Embeddings expected (type=type_error.arbitrary_type; expected_arbitrary_type=Embeddings)
Any idea?
Tried 2 different models(Jina and BAAI/bge). The error is not going. I am using open ai gpt 3.5 api.