Postgres JSONB object filtering

77 Views Asked by At

I have a Postgres table that looks like this:

CREATE TABLE products(
   id string,
   attributes jsonb,
   PRIMARY KEY(id)
);

An example of attributes field could be like:

 {
      "endDate": [
        "2024-02-20T21:00:00.000Z"
      ],
      "countries": [
        "US","IT","ES"
      ],
      "type": [
        "RETAIL"
      ],
      "startDate": [
        "2024-02-13T08:00:00.000Z"
      ],
      "categories": [
        "ELECTRONICS"
      ],
      "currency": [
        "USD"
      ],
      "status": [
        "ACTIVE"
      ]
    }

I need to create filter where as an input I might get a list of countries e.g. ["US","MX"] for which each product having one of these countries would match criteria and/or where startDate might be after certain provided date. I'm also interested in performance of such query, since this table might be really large.

I've tried sto filter products by countries using this query:

SELECT *
FROM products
WHERE
  (attributes @> '{ "countries": ["US","MX","JP"] }')

But this query would list only products which have all 3 provided countries. I need to have at least one country match, and in this example, it has "US" so it should be match.

3

There are 3 best solutions below

4
Zegarek On BEST ANSWER

It would be trivial on a normalised structure, but until you find time and budget to change it, you can use jsonb JSONPath expressions with @@ predicate check operator: demo

select * from products
where attributes 
@@ '  exists($.countries[*] 
             ?(  @=="US"
               ||@=="MX" ))
    && exists($.startDate?
              (@>"2024-02-12T07:00:00.000Z"))'
;
id attributes
1 {"type": ["RETAIL"], "status": ["ACTIVE"], "endDate": ["2024-02-20T21:00:00.000Z"], "currency": ["USD"], "countries": ["US", "IT", "ES"], "startDate": ["2024-02-13T08:00:00.000Z"], "categories": ["ELECTRONICS"]}

If you pair it with a GIN index:

create index on products using gin(attributes jsonb_path_ops);

You'll see @@ speed up significantly thanks to index scans:

Bitmap Heap Scan on public.products (cost=116.51..2432.07 rows=13325 width=396) (actual time=4.505..56.491 rows=9513 loops=1)
  Output: id, attributes
  Recheck Cond: (products.attributes @@ '(exists ($."countries"[*]?(@ == "US" || @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))'::jsonpath)
  Rows Removed by Index Recheck: 28518
  Heap Blocks: exact=2149
  -> Bitmap Index Scan on products_attributes_idx (cost=0.00..113.18 rows=13325 width=0) (actual time=4.238..4.239 rows=38031 loops=1)
        Index Cond: (products.attributes @@ '(exists ($."countries"[*]?(@ == "US" || @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))'::jsonpath)
Planning Time: 0.374 ms
Execution Time: 56.930 ms

That's a test on 40k random sample resembling yours. It's also visible that not the entire JSONPath is supported by the index, hence the re-check:

For these operators [@@, @?], a GIN index extracts clauses of the form accessors_chain = constant out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include .key, [*], and [index] accessors. The jsonb_ops operator class also supports .* and .** accessors, but the jsonb_path_ops operator class does not.

The part with date comparison using > doesn’t qualify, which is why a recheck is necessary. Searches where you don’t need the > should be faster.

If you're dealing with non-uniform timestamp formats, you might want to add a .datetime() method in there.

0
Charlieface On

You can use any of the following solutions:

  • Retrieve the JSON array and use array overlap comparison ?|
SELECT *
FROM products
WHERE
  attributes -> 'countries' ?| array['US', 'MX', 'JP']
  • Use a JsonPath (you need to specify lots of conditions)
SELECT *
FROM products
WHERE
  attributes @? '$.countries[*] ? (@ == "US" || @ == "MX" || @ == "JP")'
  • Use a subquery with an INTERSECT against the array.
SELECT *
FROM products
WHERE EXISTS (
  SELECT value
  FROM jsonb_array_elements_text(attributes -> 'countries')
  INTERSECT
  SELECT *
  FROM UNNEST(array['US', 'MX', 'JP'])
)

db<>fiddle

0
jjanes On

As Charlieface says, you could use ?| to check that any of the given values is present. But to get it to use an index you would need an expression index:

create index on products using gin ((attributes -> 'countries'));

But then it wouldn't work on any other keys than 'countries', so this might lead to a lot of indexes.

This index should be smaller and should be slightly more efficient than the more general index Zegarek proposed for use with the @@ operator, but I didn't test it when anded together with conditions on startDate.