Azure Cognitive Search - Filter is not working

77 Views Asked by At

I have RAG and am trying to implement filtering by keywords/phrases as shown below:

        public SearchOptions? CreateSearchOptions(  int searchTypeInt, 
                                                int k, 
                                                ReadOnlyMemory<float> embeddings, 
                                                ReadOnlyMemory<float> namedEntitiesEmbeddings, 
                                                string filter, 
                                                FilterAction filterAction)
    {
        _logger.LogInformation("CreateSearchOptions entered");

        SearchOptions? searchOptions = null;
        try
        {
            SearchType searchType = (SearchType)searchTypeInt;

            System.FormattableString formattableStr = $"SegmentText ct '{filter}'";
            if (!String.IsNullOrWhiteSpace(filter))
            {
                if (filterAction == FilterAction.Include)
                {
                    formattableStr = $"search.ismatch({filter}, 'SegmentText')";
                }
                else if (filterAction == FilterAction.Exclude)
                {
                    formattableStr = $"NOT(search.ismatch({filter}, 'SegmentText'))";
                }
            }

            searchOptions = new SearchOptions
            {
                //Filter = filter, will be set later
                Size = k,

                // fields to retrieve, if not specified then all are retrieved if retrievable
                Select = { "SegmentText", "NamedEntities", "docId", "segmentId", "Source", "TimeSrcModified", "TimeSrcCreated", "TimeIngested" },

                //SearchMode = SearchMode.Any, TBD!!!

                Filter = SearchFilter.Create(formattableStr) 
            };

            if ((searchType & SearchType.Vector) == SearchType.Vector)
            {
                searchOptions.VectorSearch = new VectorSearchOptions();
                VectorizedQuery vq = new VectorizedQuery(embeddings) { KNearestNeighborsCount = k, Fields = { "SegmentTextVector" } };
                searchOptions.VectorSearch.Queries.Add(vq);
                if (namedEntitiesEmbeddings.Length > 0)
                {
                    vq = new VectorizedQuery(namedEntitiesEmbeddings) { KNearestNeighborsCount = k, Fields = { "SegmentNamedEntitiesVector" } };
                    searchOptions.VectorSearch.Queries.Add(vq);
                }
            }
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, ex.Message);
            return null;
        }

        return searchOptions;
    }

The problem is that my 'documents' are actually chunks of a document and are 500-700 tokens length. The vector search returns 5 relevant chunks out of 11 chunks that constitute entire file. In my test case it is my resume. It works fine, but adding "Include" filter does not do much. If user prompt is: What projects the developer worked on in his career" and I set the filter to "Outlook" to indicate that I want the list of projects related to MS Outlook, it still gives me variety of projects, not only Outlook related. Because I am passing 5 results of vector search into OpenAI Completion API and these chunks also include some other projects besides Outlook. So what the solution would be? (I'm talking about filter here besides specifically asking "List Outlook projects only that developer worked on")

1

There are 1 best solutions below

1
Pablo Castro On

I'm not sure what the "SearchFilter.Create()" function does, but assuming it doesn't rewrite the input string, the "ct" operator used in

$"SegmentText ct '{filter}'"

doesn't exist. The filter language is documented here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter