Index not used on group by count

Question

Index not used on group by count

47 Views Asked by Kokizzu At 09 January 2024 at 17:21

I have this kind of table and index

create table dlq_data (
    id           VARCHAR(256)            not null
        primary key,
    data         JSONB                   not null,
    publish_time TIMESTAMP               not null,
    resubmitted  BOOLEAN   default false,
    created_at   TIMESTAMP default now() not null,
    message_id   VARCHAR(256)
);

create index dlq_data__message_id__idx
    on dlq_data using hash(message_id);

create index dlq_data_optimized_idx
    on dlq_data((data ->> 'root'::TEXT), (data ->> 'payloadType'::TEXT));

but when i tried to run group by query, the index not being used:

EXPLAIN SELECT data->>'root', data->>'payloadType', count(1)
         FROM dlq_data
         GROUP BY data->>'root', data->>'payloadType';

Finalize GroupAggregate  (cost=2271510.76..2271845.74 rows=1272 width=72)
  Group Key: ((data ->> 'root'::text)), ((data ->> 'payloadType'::text))
  ->  Gather Merge  (cost=2271510.76..2271807.58 rows=2544 width=72)
        Workers Planned: 2
        ->  Sort  (cost=2270510.73..2270513.91 rows=1272 width=72)
              Sort Key: ((data ->> 'root'::text)), ((data ->> 'payloadType'::text))
              ->  Partial HashAggregate  (cost=2270426.06..2270445.14 rows=1272 width=72)
                    Group Key: (data ->> 'root'::text), (data ->> 'payloadType'::text)
                    ->  Parallel Seq Scan on dlq_data  (cost=0.00..2231090.71 rows=5244714 width=64)

it only used on where query:

EXPLAIN SELECT COUNT(1)
FROM dlq_data
WHERE data->>'root' = '' AND data->>'playloadType' = ''

Aggregate  (cost=22067.44..22067.45 rows=1 width=8)
  ->  Bitmap Heap Scan on dlq_data  (cost=67.30..22067.37 rows=28 width=0)
        Recheck Cond: ((data ->> 'root'::text) = ''::text)
        Filter: ((data ->> 'playloadType'::text) = ''::text)
        ->  Bitmap Index Scan on dlq_data_optimized_idx  (cost=0.00..67.30 rows=5698 width=0)
              Index Cond: ((data ->> 'root'::text) = ''::text)

also not for <>:

EXPLAIN SELECT COUNT(1)
FROM dlq_data
WHERE data->>'root' <> '' AND data->>'playloadType' <> '';

Finalize Aggregate  (cost=2271354.81..2271354.82 rows=1 width=8)
  ->  Gather  (cost=2271354.59..2271354.80 rows=2 width=8)
        Workers Planned: 2
        ->  Partial Aggregate  (cost=2270354.59..2270354.60 rows=1 width=8)
              ->  Parallel Seq Scan on dlq_data  (cost=0.00..2257314.27 rows=5216128 width=0)
                    Filter: (((data ->> 'root'::text) <> ''::text) AND ((data ->> 'playloadType'::text) <> ''::text))

what index should be created for that group by count?

Original Q&A

There are 1 best solutions below

**jjanes** · Answer 1 · 2024-01-09T18:26:17.890000

You could get an index scan just by disabling the seq scan, set enable_seqscan = off. That is unlikely to be an improvement, though.

You could get an index-only scan if you make the index on ((data ->> 'root'::TEXT), (data ->> 'payloadType'::TEXT), data) but "data" might be too large to be accommodated in full in the index. There is no fundamentally good reason for this restriction (that "data" itself needs to be in the index), it is just that no one has bothered to code around the issue.

Index not used on group by count

There are 1 best solutions below

Related Questions in POSTGRESQL

Related Questions in INDEXING

Related Questions in JSONB

Related Questions in POSTGRESQL-14

Trending Questions

Popular # Hahtags

Popular Questions