How to improve the performance of this insert statements for huge data ingest?

1.4k Views Asked by At

I have a requirement to insert huge (50GB of random data) into my database, so that I can use a backup application to check the de-duplication ratio. I have written a small procedure like below

This is taking more than 1 hour. I don't know how to improve the performance so that I get good throughput for the insert statements. I have set SGA as 16GB.

I am newbie to Oracle. I do not know how to set parallelism to optimize my procedure to get good throughput. Please help.

alter session force parallel query parallel 4;
create table table_1(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));

create table table_2(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));

create table table_3(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));

create table table_4(
col1 varchar2(400),
-- 50 columns like this
col50 varchar2(400));

My insert script:

Declare
    rows_inserted number := 0;
Begin
  Loop
        Begin
            INSERT INTO table_1(COL1, ..COL50)
            VALUES(dbms_random.string('L', 400),..for all 50 values);
        INSERT INTO table_2(COL1, ..COL50)
            VALUES(dbms_random.string('L', 400),..for all 50 values);
            INSERT INTO table_3(COL1, ..COL50)
            VALUES(dbms_random.string('L', 400),..for all 50 values);
            INSERT INTO table_4(COL1, ..COL50)
            VALUES(dbms_random.string('L', 400),..for all 50 values);
            --Only increment counter when no duplicate exception
            rows_inserted := rows_inserted + 1;
        --Exception When DUP_VAL_ON_INDEX Then Null;
        End;
        exit when rows_inserted = 10000;
    End loop;
    commit;
End;
/

I have tried this procedure on Oracle12c, which is installed on rhel 7 VM. The Vm has 32 GB memory and 20GB swap memory and 16 vcpus.

It's taking more than 1 hour and its still running. How to implement parallelism and optimize above procedure to get a good throughput rate?

1

There are 1 best solutions below

15
APC On

You're doing single row inserts inside a loop: that's a very slow way of doing of things. SQL is a set-based language and set operations are the most performative way of doing bulk-operations. Also, you're also relying on random data to provide duplicates. Be in control of it and guarantee the ratios. Besides, how can you get DUP_VAL_ON_INDEX when your tables have no unique keys? (And if they did, you wouldn't be able to insert the duplicates you want for your experiment.)

A better approach would be to use bulk sql:

INSERT INTO table_1(COL1, COL50)
select dbms_random.string('L', 400), dbms_random.string('L', 400)
from dual
connect by level <= 10000
/

INSERT INTO table_1(COL1, COL50)
select *
from table_1
where rownum <= 1000 
/

This will give you 11000 rows in table_1, 1000 of which are duplicates. Repeat the second insertion to increase the number of duplicates.

There should be no need for parallelism.

ALl i want now is good throughput, which can insert 50 GB of data within 30 minutes,with or without parallelism.

However, this new piece of information changes my assessment. The simplest way to run this in parallel is to build separate routines for each table and run each in a separate session.