PostgreSQL Index Not Utilized for JSONB Field Filtering on Non-Equality: What Am I Missing?

95 Views Asked by At

I have a table in my database called "mytable," which contains a "date" field with JSONB data like this:

{
  ...
  "myfield": "value1"
}

The "myfield" field can currently have values "value1," "value2," and "value3." Additionally, "myfield" can be null, or it might not exist at all.

In my queries, I need to find rows where "myfield" is not equal to "value1."

select * from mytable where data->>'myfield' <> 'value1'

I've created a partial index on the "mytable" table to speed up the query:

CREATE INDEX "idx_mytable_myfield" ON "mytable" USING btree (
  ((data->>'myfield'::text) COLLATE "pg_catalog"."default" "pg_catalog"."text_ops" ASC NULLS LAST
) WHERE 
  (data->>'myfield'::text)::text <> 'value1'::text
)

However, the query is not using this index. At the same time, the selectivity of the query is high, and a sequential scan is significantly slower.

If I rewrite the query and index to use "in" instead of "<>", the index is used in the query. But this means I'll have to rewrite both the query and the index when adding new values in the future:

select * from mytable where (data->>'myfield'::text)::text in ('value2'::text,'value3'::text)

Can you please help me understand what I'm doing wrong? Is there a way to rewrite the query and/or the index?


To address possible questions in advance:

  1. I've run ANALYZE mytable after each index change.
  2. The query has high selectivity.
  3. ChatGPT asked about this and suggested exactly what I'm trying to do.
  4. My PostgreSQL version is 12.14.
  5. I don't want to build not partial index on data->>'myfield' or GIN index on all data field because of perfomance and index size

P.S. I've tried adding additional conditions to both the index and the query, but it didn't help:

...
and data is not null
and data->'myfield'::text is not null
CREATE INDEX "mytable_myfield_idx" ON "set10"."mytable" USING btree (
  ("data" ->> 'myfield'::text)  COLLATE "pg_catalog"."default" "pg_catalog"."text_ops" ASC NULLS LAST
)
WHERE ("data" ->> 'myfield'::text <> 'нет');
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS)
SELECT  
    *
FROM
    mytable
WHERE
    "data" ->> 'myfield'::text <> 'нет'
Seq Scan on set10.mytable  (cost=0.00..2041.22 rows=86911 width=35) (actual time=19.098..35.258 rows=74 loops=1)
  Output: "jiraKey", data, created
  Filter: ((mytable.data ->> 'myfield'::text) <> 'нет'::text)
  Rows Removed by Filter: 87274
  Buffers: shared hit=731
Planning Time: 0.051 ms
Execution Time: 35.278 ms

And my experiments show that the problem is in "<>", the optimizer won't use the index in this case.

upd

if I force to disable seqscan - the optimizer uses my index properly. For the some reason, it thinks that it is better to use seqscan, but it wrong

SET enable_seqscan = false;
EXPLAIN (ANALYZE,BUFFERS, COSTS, VERBOSE)
select 
    *
FROM    
    mytable
WHERE
    "data"->>'myfield' <> 'нет' 
Bitmap Heap Scan on set10.mytable  (cost=30.20..2064.86 rows=86911 width=35) (actual time=0.024..0.098 rows=74 loops=1)
  Output: "jiraKey", data, created
  Recheck Cond: ((mytable.data ->> 'myfield'::text) <> 'нет'::text)
  Heap Blocks: exact=52
  Buffers: shared hit=53
  ->  Bitmap Index Scan on mytable_myfield_idx  (cost=0.00..8.47 rows=86911 width=0) (actual time=0.013..0.014 rows=74 loops=1)
        Buffers: shared hit=1
Planning Time: 0.059 ms
Execution Time: 0.128 ms
1

There are 1 best solutions below

0
jjanes On

In order to let it get an accurate estimate of the row counts, you need an expression index like this:

create index on mytable ((data->>'myfield'));

And then do an ANALYZE after it is created. You already have that expression index in a partial form, but partial indexes are not used by the planner to derive rows counts.

If you were using a modern version of the software, you could instead create extended statistics on the expression:

create statistics asldfj on (data->>'myfield') from mytable;

This will generate the same statistics as the index does, but doesn't have the same storage or maintenance needs as the index. But it requires v14 or above.