Aggregate Issue with Redis Search

590 Views Asked by At

I have recently started learning Redis Search and encountered a few issues while working on a specific example. I am storing multiple entries, each containing a tag called accountID and a size. My goal is to calculate the sum of sizes for each account.

Here's my hash template:

HSET my-key:<<unique_ID>> accountID <<acc>> size <<the size>>

Example:

HSET my-key:1 accountID acc1 size 100
HSET my-key:2 accountID acc1 size 100
HSET my-key:3 accountID acc2 size 100

To calculate the total size per accountID, I created the following index:

FT.CREATE storageUsageIdx ON hash PREFIX 1 "my-key:" SCHEMA accountID TAG SORTABLE size NUMERIC

And here's my aggregation query:

FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage

This query works well when the number of entries is below 500k.

FT.AGGREGATE storageUsage "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "acc1"
   3) "total_usage"
   4) "880800"

However, when the number of entries exceeds a certain threshold, the result is always null:

FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "null"
   3) "total_usage"
   4) "0"

Interestingly, when I filter by accountID, I get the expected result:

FT.AGGREGATE storageUsageIdx "*" FILTER "@accountID == 'acc1'" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
   2) "acc1"
   3) "total_usage"
   4) "100000000"

There are 1M entries for accountID =acc1 with a size of 100, so the result is correct. This query takes around 3 seconds to execute.

I also added 100 more entries for accountID =acc2, and the query to calculate the total size also takes the same amount of time. This leads me to believe that my index configuration might be incorrect?

I would greatly appreciate any guidance on resolving this aggregate issue and optimizing my index configuration. Thank you for your assistance.

1

There are 1 best solutions below

0
iman safari On

In your last query, you have a problem; when you use FILTER, your filter will be applied after aggregation, so your query is faced with all of your data and filter that after aggregation. According to documentation,ft.aggregate

FILTER {expr}: filters the results using predicate expressions relating to values in each result. They are applied post-query and relate to the current state of the pipeline.

So, you can change your last query as follows:

FT.AGGREGATE storageUsageIdx "@accountID:{acc1}" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage

This query will be significantly faster.