How to apply filters on MongoDBAtlasVectorSearch similarity_search_with_score” of langchain?

101 Views Asked by At

I am using MongoDBAtlasVectorSearch and ì want to search for the most similar documents so I use the function similarity_search_with_score.

However, it seems like I am not able to add filters in this similarity_search_with_score function.

This is my code:

vector_search = MongoDBAtlasVectorSearch(
        collection=client[os.getenv("MONGODB_DB")]["files"],
        embedding=embeddings,
        index_name=os.getenv("ATLAS_VECTOR_SEARCH_INDEX_NAME"),
    )

results = vector_search.similarity_search_with_score(
        query="What are the engagements of the company",
        k=5,
        pre_filter={
            "compound": {
                "filter": [
                    {"equals": {"path": "uploaded_by", "value": chat_owner}},
                    {"in": {"path": "file_name", "values": file_names}},
                ]
            }
        },
    ) 

This is my index:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      },
      "file_name": {
        "normalizer": "lowercase",
        "type": "token"
      },
      "uploaded_by": {
        "normalizer": "lowercase",
        "type": "token"
      }
    }
  }
}

However, this gives me the following error :

pymongo.errors.OperationFailure: "knnBeta.filter.compound.filter[1].in.value" is required, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter.compound.filter[1].in.value" is required', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704804627, 1), 'signature': {'hash': b'\xfa\x15s+Q\x1d\xa86]R\xb2!\x9d\xc5b-G\xce\xa6S', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704804627, 1)}

I also tried like this :

        pre_filter={
            "$and": [
                {"uploaded_by": {"$eq": chat_owner}},
                {"file_name": {"$in": file_names}},
            ]
        },

But I got this error:

pymongo.errors.OperationFailure: "knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704802325, 9), 'signature': {'hash': b'`\xd27-\x81+\x16\xd0a\x14\xc7\x99\xa8\x05|Sx?\x0e:', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704802325, 9)}
WARNING:  StatReload detected changes in 'src/routes/chats/chats.py'. Reloading...

How can I use filters in the similarity_search_with_score properly ?

1

There are 1 best solutions below

1
Diego Freniche On

Looking at your error message

'"knnBeta.filter.compound.filter1.in.value" is required'

And based on this answer in the MongoDB Forums looks like your in clause is using values instead of value. As an example:

"in": {
      "path": "fileName",
      "value": model_documents,
}