DELETING 100 mil rows from an ORACLE database. The best approach?

107 Views Asked by At

we have a huge db cleanup action coming in and there's going to be around 100 mil rows deleted in total. There are 40 tables to delete data from. Here are my ideas and I'm open to suggestions

#1 approach

Bulk collect and then delete while logging which rows are deleted. Committing every 100 rows

Example->

--define record and needed variables

commit_counter NUMBER := 0;
COMMIT_LIMIT CONSTANT NUMBER := 100; 
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
    ENTITY_ID NUMBER,
    SOURCE VARCHAR2(100),
    SOURCE_ID VARCHAR2(100),
    MESSAGE_ID VARCHAR2(100),
    STATUS VARCHAR2(200)
);

 TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
 v_records_test t_record_entity_tests;
//Make cursor
 CURSOR c_services IS
        SELECT --all the data needed--

  OPEN c_services;
    LOOP
        FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000; 
        EXIT WHEN v_records_test.COUNT = 0;

   FORALL i IN 1..v_records_test.COUNT
            INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
            VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);

   FORALL i IN 1..v_records_test.COUNT
            DELETE FROM SERVICE WHERE ENTITY_ID = v_records_test(i).ENTITY_ID;
            v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;

        commit_counter := commit_counter + v_records_test.COUNT;

       IF commit_counter >= COMMIT_LIMIT THEN
            COMMIT;
            commit_counter := 0; 
        END IF;



   end loop;
close c_services;
commit;
--log number of deleted rows

#2 approach

Bulk collect and log which rows are being deleted. Delete all at once and then commit at the end. Don't know if it's ok since there could be 10 mil rows deleted in one of these actions

--define record and needed variables

v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
    ENTITY_ID NUMBER,
    SOURCE VARCHAR2(100),
    SOURCE_ID VARCHAR2(100),
    MESSAGE_ID VARCHAR2(100),
    STATUS VARCHAR2(200)
);

 TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
 v_records_test t_record_entity_tests;
//Make cursor
 CURSOR c_services IS
        SELECT --all the data needed--

  OPEN c_services;
    LOOP
        FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000; 
        EXIT WHEN v_records_test.COUNT = 0;

   FORALL i IN 1..v_records_test.COUNT
            INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
            VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
    end loop;
close c_services;

            DELETE FROM SERVICE WHERE ENTITY_ID = --select entity_id of data needed to be deleted that is the same data that's in the cursor;
    
            v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;



   
commit;
--log number of deleted rows

What's a better approach and is there's a third which would be better than these 2?

2

There are 2 best solutions below

3
Paul W On BEST ANSWER

If you can perform this operation in application downtime and you are deleting a significant portion of a large table, it is far more efficient to create a new segment with the rows you want to keep rather than delete the ones you don't. The most efficient is a CTAS and replace:

CREATE TABLE abc$new PARALLEL (DEGREE 16) AS SELECT * FROM abc WHERE [rows-I-want-to-keep];
ALTER TABLE abc RENAME TO abc$old;
ALTER TABLE abc$new RENAME TO abc;

The downside is you have to also script out and reapply any subordinate objects like indexes, constraints, triggers and grants. But this is the most efficient way (least amount of processing time) to accomplish the task.

A slightly less efficient method but which requires less care in dealing with subordinate objects, uses the same concept but moves the data twice so the original object remains the permanent object:

CREATE TABLE abc$old NOLOGGING PARALLEL (DEGREE 16) AS SELECT * FROM abc;

TRUNCATE TABLE abc;

ALTER SESSION ENABLE PARALLEL DML;

INSERT /*+ APPEND PARALLEL(abc,16) */ INTO abc SELECT * FROM abc$old WHERE [rows-I-want-to-keep];

COMMIT;

The downside here is that for a time the table is empty, so your app had better be down. Both techniques will result in a table that no longer has the rows you wished to delete, and another table with the original contents in case you need to recover. You can then plan on dropping the abc$old table at a later time after you are sure the data won't be needed so you can free up the space.

Of course if you must perform these maintenance operation online while the application is using these tables then that requirement will push you back to using a gradual batched delete process of some kind such as you're thinking of. That would be much, much slower but is less invasive.

0
Littlefoot On

You aren't handling any exceptions. Are you sure that everything will go OK with delete operation for all rows? What about foreign key constraints, for example?

Anyway: if you're doing it row-by-row, it is slow-by-slow and it'll surely take time to delete 10 million rows (as you said). It could go faster if you switch to set processing and - instead of forall - use table function. Something like this:

Log table:

SQL> create table delete_log (empno number, log_date date);

Table created.

Sample table; rows whose deptno <> 30 are supposed to be deleted:

SQL> select * From test order by deptno, ename;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
      7782 CLARK      MANAGER         7839 09.06.1981 00:00:00       2450                    10
      7839 KING       PRESIDENT            17.11.1981 00:00:00       5000                    10
      7934 MILLER     CLERK           7782 23.01.1982 00:00:00       1300                    10
      7876 ADAMS      CLERK           7788 12.01.1983 00:00:00       1100                    20
      7902 FORD       ANALYST         7566 03.12.1981 00:00:00       3000                    20
      7566 JONES      MANAGER         7839 02.04.1981 00:00:00       2975                    20
      7788 SCOTT      ANALYST         7566 09.12.1982 00:00:00       3000                    20
      7369 SMITH      CLERK           7902 17.12.1980 00:00:00        800                    20
      7499 ALLEN      SALESMAN        7698 20.02.1981 00:00:00       1600        300         30
      7698 BLAKE      MANAGER         7839 01.05.1981 00:00:00       2850                    30
      7900 JAMES      CLERK           7698 03.12.1981 00:00:00        950                    30
      7654 MARTIN     SALESMAN        7698 28.09.1981 00:00:00       1250       1400         30
      7844 TURNER     SALESMAN        7698 08.09.1981 00:00:00       1500          0         30
      7521 WARD       SALESMAN        7698 22.02.1981 00:00:00       1250        500         30

14 rows selected.

Procedure:

SQL> declare
  2    l_tab sys.odcinumberlist;
  3    l_tot number := 0;
  4    cursor c1 is select empno from test where deptno <> 30;
  5  begin
  6    open c1;
  7    loop
  8      fetch c1 bulk collect into l_tab limit 3;
  9      exit when l_tab.count = 0;
 10
 11      insert into delete_log (empno, log_date)
 12        select column_value, sysdate
 13        from table(l_tab);
 14
 15      delete from test t
 16        where exists (select null from table(l_tab)
 17                      where column_value = t.empno);
 18
 19      l_tot := l_tot + sql%rowcount;
 20    end loop;
 21    dbms_output.put_line('Deleted ' || l_tot || ' rows');
 22  end;
 23  /
Deleted 8 rows

PL/SQL procedure successfully completed.

Result:

SQL> select * From test order by deptno, ename;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
      7499 ALLEN      SALESMAN        7698 20.02.1981 00:00:00       1600        300         30
      7698 BLAKE      MANAGER         7839 01.05.1981 00:00:00       2850                    30
      7900 JAMES      CLERK           7698 03.12.1981 00:00:00        950                    30
      7654 MARTIN     SALESMAN        7698 28.09.1981 00:00:00       1250       1400         30
      7844 TURNER     SALESMAN        7698 08.09.1981 00:00:00       1500          0         30
      7521 WARD       SALESMAN        7698 22.02.1981 00:00:00       1250        500         30

6 rows selected.

Log:

SQL> select * From delete_log;

     EMPNO LOG_DATE
---------- -------------------
      7782 16.11.2023 11:40:45
      7788 16.11.2023 11:40:45
      7839 16.11.2023 11:40:45
      7876 16.11.2023 11:40:45
      7902 16.11.2023 11:40:45
      7934 16.11.2023 11:40:45
      7369 16.11.2023 11:40:45
      7566 16.11.2023 11:40:45

8 rows selected.

SQL>

As of committing (which I didn't implement here; you know how to do it): 100 is (in my opinion) too low; set it to e.g. 10000 (equal to limit you used in fetch).

To remind you: consider exception handling (if necessary).