Insert thousands of records from Java to Oracle, how can I get the best performance?

1.6k Views Asked by At

I need to make a batch process in java that will read a file of a variable amount of records (though can safely assume 5k+), treat them and then insert the records into an oracle 11g database. I need to do it via call(s) to a stored procedure.

I have made a similar process already some months ago, which I am not too proud of (for no particular reason, other than I am sure it is not optimal at all). The previous solution I kept everything in memory and in order to insert the rows I created an INSERT ALL statement in which I just appended all the records via a stringbuilder, and just executed that one statement to insert 15k+ records in one go. This time I need to use a stored procedure for the insertion of the data. I have been reading and now know there are ways to send arrays of data to stored procedures, so I would be able to send multiple records at a time.

Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?

Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?

If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?

I need to be able to insert all the records, and rollback in case of any error. For which I am going to use transactions. I am technically not required to meet any threshold in terms of performance, but I am intersted in the topic and this could be a good time to start worrying more about it, so I would like some pointers and tips from someone with experience in the topic.

2

There are 2 best solutions below

3
Stew Ashton On BEST ANSWER

"Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?"

Limit to a certain amount of records. I generally start with between 100 and 1000, depending on the total size of a record.

"Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?"

No. You will waste CPU and above all time: every time Java calls the database there is time spent just on sending the message and getting the reply back (related to "latency").

"If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?"

Yes, but those methods are at their best with SQL statements (such as INSERT), not calls to stored procedures.

I need to be able to insert all the records, and rollback in case of any error.

Set autocommit off (which I recommend in general) and commit when all is OK.

If your stored procedures have no added value, but simply do the inserts, then it would be simpler and very efficient to do batched inserts. There are very good arguments for using stored procedures despite the extra complication, but then you would have to populate the arrays.

In every case, it is vital to use bind variables and not concatenate the values into the SQL statements (or calls to SPs) as literals.

Best regards, Stew Ashton

P.S. for 5k+ records, multi-threading is overkill.

1
uneq95 On

Should you use stored procedure(SP)?

I don't think that it would have any significant performance improvement because you have an INSERT statement. An SP would have been beneficial if you had some complex queries and using it would have saved the query compilation time. A simple insert statement won't take much time to compile.

So, in my opinion, send your query on the fly using Java's PreparedStatement.

The approach I would have followed:

As your requirement is to insert all the data and rollback in case of errors, I would suggest you insert the whole set of data in batches. In case a batch fails, you can just roll back the changes for the batch and retry insertion for the batch.

If you were to send the whole data set in one shot, then you would have to roll back the entire change caused due to an error in a single insert statement.

Another benefit of using batch is that you will send the batch data in a single JDBC connection. Making, maintaining and clearing the connection creates an overhead. Thus, batch saves that overhead to some extent.

Also, you can use multi-threading, where you can have tasks read a batch of data, process them and insert them (using batch). Your tasks can access JDBC connections from a connection pool such as HikariCP. So, while some of the threads are busy with data insertion, others can read and process the data.

Some material to read on multi-threaded inserts : https://dba.stackexchange.com/questions/16131/how-does-oracle-handle-multiple-concurrent-inserts-against-one-table

PS: Constructive criticism is welcome.