Insert result of a parametrized query into a CTE

126 Views Asked by At

I am querying a table containing billions of rows (BIG_TABLE) in Oracle. The table is partitionned on a monthly basis (DTE is the month of flux). The DBA wants us to filter our queries by month of flux. The DBA warned that (i) not doing so may cause slower queries and (ii) it may impact the others users so he may kill the non-filtered queries.

Since I need several months (sometimes up to 100), I use a parametrized query to pass the months I need (more below). I collect the results for all months in an intermediate table (INTERMEDIATE_TABLE, containing about 200 million rows per month) and then I aggregate the data (FINAL_TABLE, used for the analyses). The sum must be done by CHR whatever the month of flux.

-- query1
CREATE TABLE
  INTERMEDIATE_TABLE (
    CHR VARCHAR2(255),
    NBR NUMBER,
    DTE DATE
);

-- query2
INSERT INTO
  INTERMEDIATE_TABLE
SELECT
  CHR,
  NBR,
  DTE
FROM
  BIG_TABLE
WHERE
  DTE = TO_DATE(?, 'YYYY-MM-DD');

-- query3
CREATE TABLE
  FINAL_TABLE AS
SELECT
  CHR,
  SUM(NBR) AS NBR
FROM
  INTERMEDIATE_TABLE
GROUP BY
  CHR;

After saving the queries as strings (query1, query2, query3), I use R's DBI to run the queries. dbExecute() passes the dates one by one to the equal comparison in query2, so the query is executed 3 times (i.e. the list is not passed in one go).

library(DBI)
dbConnect(odbc::odbc(), ...)
dbExecute(con, query1)
dbExecute(con, query2, params = list(c("2020-01-01", "2020-02-01", "2020-03-01")))
dbExecute(con, query3)

I would like INTERMEDIATE_TABLE to be temporary. Can I INSERT INTO a CTE?

I am open to alternative solutions but I am looking for an ANSI solution because a move to PostgreSQL is planned. I would like to avoid temporary table because only the data is temporary in Oracle, not the table. Of course I could just DROP the table afterwards, but it seems more elegant/explicit (and efficient?) to make it temporary in the first place.


A reproducible example:

CREATE TABLE
  BIG_TABLE (
    CHR VARCHAR2(255),
    NBR NUMBER,
    DTE DATE
);

INSERT ALL
  INTO BIG_TABLE VALUES ('A', 2, DATE '2020-01-01')
  INTO BIG_TABLE VALUES ('B', 3, DATE '2020-01-01')
  INTO BIG_TABLE VALUES ('A', 1, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('B', 2, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('A', 3, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('B', 2, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('B', 4, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('C', 1, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('B', 4, DATE '2020-04-01')
  INTO BIG_TABLE VALUES ('D', 2, DATE '2020-05-01')
SELECT 1 FROM DUAL;

Desired output:

CHR NBR
  A   6
  B  11
  C   1
3

There are 3 best solutions below

11
Paul W On BEST ANSWER

You don't need any kind of intermediate stage at all, not even a subquery.

CREATE TABLE FINAL_TABLE AS
SELECT CHR,
       SUM(NBR) NBR
  FROM BIG_TABLE
 WHERE DTE = ?
 GROUP BY CHR;

Of course, with a billion rows, you want parallelism, which you may or may not be getting depending on how the table is configured. It might help to explicitly request it:

CREATE TABLE FINAL_TABLE PARALLEL (DEGREE 16) NOLOGGING  AS
SELECT CHR,
       SUM(NBR) NBR
  FROM BIG_TABLE
 WHERE DTE = ?
 GROUP BY CHR;

If you need multiple months, you have several options. I suggest a single query in a manner that maximizes parallel thread distribution by partition, if this is a partitioned table:

CREATE TABLE final_table PARALLEL (DEGREE 16) NOLOGGING AS
SELECT chr,
       SUM(nbr) nbr
  FROM (SELECT chr,
               SUM(nbr) nbr
          FROM big_table
         WHERE dte = ?
         GROUP BY chr
        UNION ALL
        SELECT chr,
               SUM(nbr) nbr
          FROM big_table
         WHERE dte = ?
         GROUP BY chr
        UNION ALL
        SELECT chr,
               SUM(nbr) nbr
          FROM big_table
         WHERE dte = ?
         GROUP BY chr)
  GROUP BY chr

If you need to pull many months and have them all end up in this table while only pulling one month at a time, then precreate the table and insert each month:

CREATE TABLE final_table(dte date,chr varchar2(18),nbr integer); -- one time

INSERT /*+ append */ into final_table
SELECT /*+ parallel(16) */
       dte,
       chr,
       SUM(nbr) nbr
  FROM big_table
 WHERE dte = ?
 GROUP BY dte, 
          chr;

COMMIT;

-- repeat for every month. 

Then at the end if you need the whole aggregated across time, you can do a group by to exclude the date:

SELECT chr,
       SUM(nbr) nbr
  FROM final_table
 GROUP BY chr
11
nbk On

You can use a subquery to get your intermediate table and use it as such, so that all would run in one query.

CREATE TABLE
  FINAL_TABLE AS
SELECT
  CHR,
  SUM(NBR) NBR
FROM
  (SELECT
  CHR,
  NBR,
  DTE
FROM
  BIG_TABLE
WHERE
  DTE = TO_DATE(?, 'YYYY-MM-DD')) INTERMEDIATE_TABLE
GROUP BY
  CHR;

As CTE

CREATE TABLE FINAL_TABLE 
AS 
WITH INTERMEDIATE_TABLE AS ( 
  SELECT
      CHR,
      NBR,
      DTE
    FROM
      BIG_TABLE
    WHERE
      DTE = TO_DATE(?, 'YYYY-MM-DD')
) 
    SELECT
      CHR,
      SUM(NBR) NBR
FROM INTERMEDIATE_TABLE
GROUP BY CHR;
0
Chris Saxon On

Consider using materialized views (MVs) to do this instead of temporary tables. This gives the same effect - you're precomputing and storing the results in table - but with more in-built controls to help you refresh the data.

For example, create this MV:

create materialized view summaries as
  select chr, trunc ( dte, 'mm' ) mth, sum ( nbr ) tot
  from   big_table
  group  by chr, trunc ( dte, 'mm' );

This will store one row/chr/month. Unless there's millions of values for 'chr' each month, this should be enough to avoid the requirement to "filter by month" as you'll be reading substantially less data each time.

If you make the MV fast refresh on commit, then the database will keep it up-to-date for you automatically.

Even if you're forced to go down the "filter by month" approach, you could still build this into the MV.

Create a filtering table to store the months you want, insert the values before each run, and refresh the MV:

create table filter_table (
  month_value date
);

create materialized view summaries as
  select chr, trunc ( dte, 'mm' ) mth, sum ( nbr ) tot
  from   big_table bt
  join   filter_table ft
  on     trunc ( dte, 'mm' ) = ft.month_value
  group  by chr, trunc ( dte, 'mm' );
  
insert into filter_table values ( date '2020-01-01' );
commit;

exec dbms_mview.refresh ( 'summaries' );

select * from summaries;

CHR    MTH                  TOT    
B      01-JAN-2020 00:00         3 
A      01-JAN-2020 00:00         2