I have this kind of table and index
create table dlq_data (
id VARCHAR(256) not null
primary key,
data JSONB not null,
publish_time TIMESTAMP not null,
resubmitted BOOLEAN default false,
created_at TIMESTAMP default now() not null,
message_id VARCHAR(256)
);
create index dlq_data__message_id__idx
on dlq_data using hash(message_id);
create index dlq_data_optimized_idx
on dlq_data((data ->> 'root'::TEXT), (data ->> 'payloadType'::TEXT));
but when i tried to run group by query, the index not being used:
EXPLAIN SELECT data->>'root', data->>'payloadType', count(1)
FROM dlq_data
GROUP BY data->>'root', data->>'payloadType';
Finalize GroupAggregate (cost=2271510.76..2271845.74 rows=1272 width=72)
Group Key: ((data ->> 'root'::text)), ((data ->> 'payloadType'::text))
-> Gather Merge (cost=2271510.76..2271807.58 rows=2544 width=72)
Workers Planned: 2
-> Sort (cost=2270510.73..2270513.91 rows=1272 width=72)
Sort Key: ((data ->> 'root'::text)), ((data ->> 'payloadType'::text))
-> Partial HashAggregate (cost=2270426.06..2270445.14 rows=1272 width=72)
Group Key: (data ->> 'root'::text), (data ->> 'payloadType'::text)
-> Parallel Seq Scan on dlq_data (cost=0.00..2231090.71 rows=5244714 width=64)
it only used on where query:
EXPLAIN SELECT COUNT(1)
FROM dlq_data
WHERE data->>'root' = '' AND data->>'playloadType' = ''
Aggregate (cost=22067.44..22067.45 rows=1 width=8)
-> Bitmap Heap Scan on dlq_data (cost=67.30..22067.37 rows=28 width=0)
Recheck Cond: ((data ->> 'root'::text) = ''::text)
Filter: ((data ->> 'playloadType'::text) = ''::text)
-> Bitmap Index Scan on dlq_data_optimized_idx (cost=0.00..67.30 rows=5698 width=0)
Index Cond: ((data ->> 'root'::text) = ''::text)
also not for <>:
EXPLAIN SELECT COUNT(1)
FROM dlq_data
WHERE data->>'root' <> '' AND data->>'playloadType' <> '';
Finalize Aggregate (cost=2271354.81..2271354.82 rows=1 width=8)
-> Gather (cost=2271354.59..2271354.80 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=2270354.59..2270354.60 rows=1 width=8)
-> Parallel Seq Scan on dlq_data (cost=0.00..2257314.27 rows=5216128 width=0)
Filter: (((data ->> 'root'::text) <> ''::text) AND ((data ->> 'playloadType'::text) <> ''::text))
what index should be created for that group by count?
You could get an index scan just by disabling the seq scan,
set enable_seqscan = off. That is unlikely to be an improvement, though.You could get an index-only scan if you make the index on
((data ->> 'root'::TEXT), (data ->> 'payloadType'::TEXT), data)but "data" might be too large to be accommodated in full in the index. There is no fundamentally good reason for this restriction (that "data" itself needs to be in the index), it is just that no one has bothered to code around the issue.