I'm storing simple dataset in a Postgresql database using langchain and openai embedding.
This seems to be working fine.
But when querying the database I get the below error message.
connection_string = "postgresql://[email protected]/aitest"
collection_name = "items"
def datastore():
text = 'My long text is blue'
text_splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
texts = text_splitter.split_text(text)
embeddings = OpenAIEmbeddings()
db = PGEmbedding.from_texts(
embedding=embeddings,
texts=texts,
collection_name=collection_name,
connection_string=connection_string,
)
def retrieve():
embeddings = OpenAIEmbeddings()
db1 = PGEmbedding.from_existing_index(
embedding=embeddings,
collection_name=collection_name,
pre_delete_collection=False,
connection_string=connection_string,
)
query = "What did the president say about Ketanji Brown Jackson"
db1.similarity_search_with_score(query)
Running datastore(), creates a langchain_pg_collection and lagchain_pg_embedding with data. Note that lagchain_pg_embedding.embedding column is of type "_float4" (not sure if this is expected).
Running retrieve() crashes and returns below error:
ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: real[] <-> real[] LINE 1: ...edding_uuid, abs(langchain_pg_embedding.embedding <-> ARRAY[... ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. [SQL: SELECT langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.custom_id AS langchain_pg_embedding_custom_id, langchain_pg_embedding.uuid AS langchain_pg_embedding_uuid, abs(langchain_pg_embedding.embedding <-> %(embedding_1)s::REAL[]) AS distance FROM langchain_pg_embedding WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID ORDER BY abs(langchain_pg_embedding.embedding <-> %(embedding_2)s::REAL[]) ASC LIMIT %(param_1)s] [parameters: {'embedding_1': [-0.052560476281678786, -0.008834546998953006, -0.008705921112153828, 0.007737844863369729, 0.0037538363447478127, 0.0042243351349390045, -0.002059701 ... (34103 characters truncated) ... 785052453730274, -0.029570345478911762, 0.01835961005386705, 0.025603261651909844, -0.020891503500820918, -0.02010621047563943, -0.026970754196800433], 'collection_id_1': UUID('9defca44-19a2-453d-a3ee-5c1293542873'), 'embedding_2': [-0.052560476281678786, -0.008834546998953006, -0.008705921112153828, 0.007737844863369729, 0.0037538363447478127, 0.0042243351349390045, -0.002059701 ... (34103 characters truncated) ... 785052453730274, -0.029570345478911762, 0.01835961005386705, 0.025603261651909844, -0.020891503500820918, -0.02010621047563943, -0.026970754196800433], 'param_1': 4}] (Background on this error at: https://sqlalche.me/e/20/f405)
I'm new to langchain and followed this tutorial https://python.langchain.com/docs/integrations/vectorstores/pgembedding but haven't been able to figure out what is causing the problem.
Thanks