I created a REST API whose schema looks like this :
type Todo struct {
ID gocql.UUID `json:"id"`
User_ID string `json:"user_id"`
Title string `json:"title"`
Description string `json:"description"`
Status string `json:"status"`
Created time.Time `json:"created"`
Updated time.Time `json:"updated"`
}
So as I'm using CQL and majorly all the function queries are filtering on ID but in one query I have to filter using where clause on "Status" and Order By "Created" in DESC like below :
SELECT id, user_id, title, description, status, created, updated FROM todos WHERE status = ? ORDER BY created
Currently I'm using below model :
CREATE TABLE todos (
id UUID,
user_id TEXT,
title TEXT,
description TEXT,
status TEXT,
created TIMESTAMP,
updated TIMESTAMP,
PRIMARY KEY((status, created), id)
);
but using this I'm getting filtering errors. What should be the perfect table creation so I can perform the desired operations without any issue
Expecting some good responses to get my issues fixed
The problem requires two tables to solve, because there are two usage patterns which are not compatible - the bulk of the queries are handled by the ID partition key, so the first table would be partitioned by ID - but the status query can not use that table design.
For the table schema you posted, the filtering errors are because the partition key is a composite key of the 2 values, status and created. As such from a partition perspective, they are hashed to a single value. By providing only 1 part of the key, it would be forced to scan all partitions.
Your initial reaction would then be to change the primary key to be:
The double parens are not required, but I left them in to make it obvious what the change is. The likelihood though is that status is a very low cardinality value, which would result in a very high number of records within a single partition - which is unadvisable.
You may be tempted into a secondary index - but this suffers the same problem (as does a materialized view). The cardinality problem needs to be solved.
I would look to use time buckets in some form, such as:
Where time_bucket is a timestamp field which is a truncated version of the created field that your application provides, such as week, day, or even hour. You should know which time bucket you are searching within, or a number of buckets that you would search for in parallel, and construct the final ordered answer within the application code. The truncation level to use should be based on how much data per status you expect within a period of time.
This increases the cardinality of the partition key and distributes it across the nodes within the cluster.
If there were no updates, TWCS would then be a feasible option but the existence of the updated field in the struct, indicates that this is not the case.
Final result: