A table EMPLOYEE has below structure with 5 Million rows (10^6).
Name
------
EMPNAME
EMPID
MANAGERID (foreign key to same table)
STATUS
We have a different table EmpAct where we perform insert as below
INSERT INTO empact VALUES
(empName, empid, status)
SELECT empName, empid, status
FROM employee e
WHERE e1.status in (1, 2, 3)
OR
EXISTS (SELECT 1
FROM employee m
WHERE m.empid = e.managerid AND
m.status IN (1,2,3)
)
This becomes a costly operation because for each non active employee (not status 1,2,3) it tries to make an exists run into the same table of 5 Million records O(N^2) operation.
Is there a way to make it a planar O(N) operation?
Also, is the insert into query ok to use or should we use some other PL/SQL construct to make inserts?
This query can be converted to O(N) by re-writing the query to a format that supports hash joins. Although the algorithm analysis gets tricky very quickly, and I'm not sure if the new form will be faster.
Sample Schema
Original Query - O(N*LOG(N))
The
FILTERoperation is a bit weird, but in this case I believe it's acting like a loop, so we can multiple together operation 3 and 4/5 to find the total run time. TheTABLE ACCESS FULLis O(N) and theINDEX UNIQUE SCANwould be O(LOG(N)). So you should be seeing O(N*LOG(N)) instead of O(N^2).If you're seeing two full table scans, that would be O(N^2), but then you should try to investigate why Oracle isn't using the index.
Striving for O(N)
If you want to compare data in O(N) I believe a hash join is the only option. Hash joins only work with equality conditions, and in this case I think Oracle isn't smart enough to understand how to rewrite your query into regular equality conditions. We can do it ourselves by splitting the query into two pieces and UNIONing them together:
The new plan is more complicated but it uses a hash join in operation 5, which in theory can be O(2N). But there's another O(N)
FULL TABLE SCANon line 4. And there's a O(N*LOG(N))SORT UNIQUEon line 2, although hopefully the N is much smaller by that step.What's Best?
You're thinking about this question the right way - more people should consider the algorithm complexity of their programs. And usually converting a join condition to something that supports hash joins is a good idea. But with this weird join condition I'm not sure if you'll see an improvement. This might be one of the many cases where the constants and implementation details outweigh the complexity.
If you want to learn more, here's a chapter I wrote about using algorithm analysis to understand SQL performance.