Optimization needed for insert into query with select having self join

Question

Optimization needed for insert into query with select having self join

320 Views Asked by Panther_Black At 27 February 2020 at 09:15

A table EMPLOYEE has below structure with 5 Million rows (10^6).

Name   
------  
EMPNAME
EMPID
MANAGERID (foreign key to same table)
STATUS

We have a different table EmpAct where we perform insert as below

INSERT INTO empact VALUES
(empName, empid, status)
SELECT empName, empid, status
FROM  employee e
WHERE e1.status in (1, 2, 3) 
      OR
      EXISTS (SELECT 1 
              FROM employee  m 
              WHERE m.empid = e.managerid AND
                 m.status IN (1,2,3) 
            )

This becomes a costly operation because for each non active employee (not status 1,2,3) it tries to make an exists run into the same table of 5 Million records O(N^2) operation.

Is there a way to make it a planar O(N) operation?

Also, is the insert into query ok to use or should we use some other PL/SQL construct to make inserts?

Original Q&A

There are 3 best solutions below

Boneist On 27 February 2020 at 12:28

You could use a hierarchical query to get the manager status as well as the current emp's status, e.g.:

WITH emp AS (SELECT 'a' empname, 1 empid, 1 status, NULL mgr_id FROM dual UNION ALL
             SELECT 'b' empname, 2 empid, 2 status, 1 mgr_id FROM dual UNION ALL
             SELECT 'c' empname, 3 empid, 4 status, NULL mgr_id FROM dual UNION ALL
             SELECT 'd' empname, 4 empid, 1 status, 3 mgr_id FROM dual UNION ALL
             SELECT 'e' empname, 5 empid, 6 status, 3 mgr_id FROM dual UNION ALL
             SELECT 'f' empname, 6 empid, 3 status, NULL mgr_id FROM dual UNION ALL
             SELECT 'g' empname, 7 empid, 5 status, 6 mgr_id FROM dual),
 initres AS (SELECT empname, empid, status, mgr_id, PRIOR status mgr_status
             FROM   emp
             START WITH mgr_id IS NULL
             CONNECT BY PRIOR empid = mgr_id)
SELECT empname,
       empid,
       status
FROM   initres
WHERE  status IN (1, 2, 3)
OR     mgr_status IN (1, 2, 3);

EMPNAME      EMPID     STATUS
------- ---------- ----------
a                1          1
b                2          2
d                4          1
f                6          3
g                7          5

(You obviously wouldn't need the emp subquery; that's only there to generate data for the rest of the query.)

Roger Cornejo On 27 February 2020 at 16:10

With self join:

insert into empact
select e.empname, e.empid, e.status
from emp e left outer join emp m on e.mgr_id = m.empid
where e.status in (1,2,3) or m.status in (1,2,3)
;

**Jon Heller** · Accepted Answer · 2020-02-28T05:43:57.213000

This query can be converted to O(N) by re-writing the query to a format that supports hash joins. Although the algorithm analysis gets tricky very quickly, and I'm not sure if the new form will be faster.

Sample Schema

create table employee
(
    empname varchar2(100),
    empid number primary key,
    managerid number references employee(empid),
    status number
);
create table empact as select empName, empid, status from employee where 1=0;

insert into employee
select level, level, level, 1 from dual connect by level <= 100000;
begin
    dbms_stats.gather_table_stats(user, 'EMPLOYEE');
end;
/

Original Query - O(N*LOG(N))

explain plan for
INSERT INTO empact
SELECT empName, empid, status
FROM  employee e
WHERE e.status in (1, 2, 3) 
      OR
      EXISTS (SELECT 1 
              FROM employee  m 
              WHERE m.empid = e.managerid AND
                 m.status IN (1,2,3)
            );

select * from table(dbms_xplan.display(format => 'basic'));


Plan hash value: 961581243

------------------------------------------------------
| Id  | Operation                     | Name         |
------------------------------------------------------
|   0 | INSERT STATEMENT              |              |
|   1 |  LOAD TABLE CONVENTIONAL      | EMPACT       |
|   2 |   FILTER                      |              |
|   3 |    TABLE ACCESS FULL          | EMPLOYEE     |
|   4 |    TABLE ACCESS BY INDEX ROWID| EMPLOYEE     |
|   5 |     INDEX UNIQUE SCAN         | SYS_C0027564 |
------------------------------------------------------

The FILTER operation is a bit weird, but in this case I believe it's acting like a loop, so we can multiple together operation 3 and 4/5 to find the total run time. The TABLE ACCESS FULL is O(N) and the INDEX UNIQUE SCAN would be O(LOG(N)). So you should be seeing O(N*LOG(N)) instead of O(N^2).

If you're seeing two full table scans, that would be O(N^2), but then you should try to investigate why Oracle isn't using the index.

Striving for O(N)

If you want to compare data in O(N) I believe a hash join is the only option. Hash joins only work with equality conditions, and in this case I think Oracle isn't smart enough to understand how to rewrite your query into regular equality conditions. We can do it ourselves by splitting the query into two pieces and UNIONing them together:

explain plan for
INSERT INTO empact
SELECT empName, empid, status
FROM  employee e
WHERE e.status in (1, 2, 3) 
UNION
SELECT empName, empid, status
FROM  employee e
WHERE EXISTS (SELECT 1 
              FROM employee  m 
              WHERE m.empid = e.managerid AND
                 m.status IN (1,2,3)
            );

select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 3147379352

---------------------------------------------
| Id  | Operation                | Name     |
---------------------------------------------
|   0 | INSERT STATEMENT         |          |
|   1 |  LOAD TABLE CONVENTIONAL | EMPACT   |
|   2 |   SORT UNIQUE            |          |
|   3 |    UNION-ALL             |          |
|   4 |     TABLE ACCESS FULL    | EMPLOYEE |
|   5 |     HASH JOIN            |          |
|   6 |      TABLE ACCESS FULL   | EMPLOYEE |
|   7 |      TABLE ACCESS FULL   | EMPLOYEE |
---------------------------------------------

The new plan is more complicated but it uses a hash join in operation 5, which in theory can be O(2N). But there's another O(N) FULL TABLE SCAN on line 4. And there's a O(N*LOG(N)) SORT UNIQUE on line 2, although hopefully the N is much smaller by that step.

What's Best?

You're thinking about this question the right way - more people should consider the algorithm complexity of their programs. And usually converting a join condition to something that supports hash joins is a good idea. But with this weird join condition I'm not sure if you'll see an improvement. This might be one of the many cases where the constants and implementation details outweigh the complexity.

If you want to learn more, here's a chapter I wrote about using algorithm analysis to understand SQL performance.

Optimization needed for insert into query with select having self join

There are 3 best solutions below

Sample Schema

Original Query - O(N*LOG(N))

Striving for O(N)

What's Best?

Related Questions in SQL

Related Questions in ORACLE

Related Questions in PERFORMANCE

Related Questions in QUERY-OPTIMIZATION

Related Questions in INSERT-SELECT

Trending Questions

Popular # Hahtags

Popular Questions