Index the entire DB into a single document using Lucene

139 Views Asked by Ramesh At 03 July 2025 at 02:45

I am working on improving the performance of existing ASP.Net application and lessen the database hits for each search criteria click on a page. In the process i am trying to implement the Lucene.Net .

but strange thing is i am trying to index using a "select *" statement on a table which is having millions of records, hangs at DB level itself.

Then how it is possible to get the entire "select *" results into a single document with lesser time without making the application hanged, from there i can apply search filters on the document nad show up in the grid.

Thanks in advance

Original Q&A

There are 1 best solutions below

Bart Czernicki On 13 November 2014 at 20:42

When indexing millions of records in Lucene.NET you will need to break up the process. What you are trying to do is read all of the data up front, have it sit in memory, then have Lucene.NET take all of that read data and then build a massive index. It simply will fall apart with large data sets. You need to break the process up into a "buffered" architecture.

What I did in the past is..and what you could do for example:

break the select statement into a stored procedure that returned pieces the millions of records. For example, if I had 100 million records it would return: 25 million four times
I also used four different threads to read the data. Then you start an asynchronous queue that as soon as the data is read from the database it gets fed into the queue buffer. Read up on BlockingQueues in .NET
Then you have another series of threads reading the data from the queue and then they will pipe that into the Lucene index building process
the last step is to build the indexes (from the previous step) in parallel and the use the Lucene.NET merge option to merge all of the data into one big index

I have found this architecture above scalable, as you can run as many threads (read and build) as you have cores. It is also cloud scalable, because you can use Azure Worker Roles and Queues to scale this to many many machines if you have a super huge index.

Index the entire DB into a single document using Lucene

There are 1 best solutions below

Related Questions in JAVA

Related Questions in ASP.NET

Related Questions in VB.NET

Related Questions in LUCENE

Related Questions in LUCENE.NET

Trending Questions

Popular # Hahtags

Popular Questions