I have recently started learning Redis Search and encountered a few issues while working on a specific example. I am storing multiple entries, each containing a tag called accountID and a size. My goal is to calculate the sum of sizes for each account.
Here's my hash template:
HSET my-key:<<unique_ID>> accountID <<acc>> size <<the size>>
Example:
HSET my-key:1 accountID acc1 size 100
HSET my-key:2 accountID acc1 size 100
HSET my-key:3 accountID acc2 size 100
To calculate the total size per accountID, I created the following index:
FT.CREATE storageUsageIdx ON hash PREFIX 1 "my-key:" SCHEMA accountID TAG SORTABLE size NUMERIC
And here's my aggregation query:
FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
This query works well when the number of entries is below 500k.
FT.AGGREGATE storageUsage "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
2) "acc1"
3) "total_usage"
4) "880800"
However, when the number of entries exceeds a certain threshold, the result is always null:
FT.AGGREGATE storageUsageIdx "*" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
2) "null"
3) "total_usage"
4) "0"
Interestingly, when I filter by accountID, I get the expected result:
FT.AGGREGATE storageUsageIdx "*" FILTER "@accountID == 'acc1'" GROUPBY 1 @accountID REDUCE SUM 1 @size AS total_usage
1) "1"
2) 1) "accountID"
2) "acc1"
3) "total_usage"
4) "100000000"
There are 1M entries for accountID =acc1 with a size of 100, so the result is correct. This query takes around 3 seconds to execute.
I also added 100 more entries for accountID =acc2, and the query to calculate the total size also takes the same amount of time. This leads me to believe that my index configuration might be incorrect?
I would greatly appreciate any guidance on resolving this aggregate issue and optimizing my index configuration. Thank you for your assistance.
In your last query, you have a problem; when you use
FILTER, your filter will be applied afteraggregation, so your query is faced with all of your data and filter that afteraggregation. According to documentation,ft.aggregateSo, you can change your last query as follows:
This query will be significantly faster.