after fresh installation of nutch and solr crawl error

77 Views Asked by At

I have a problem after fresh installation of nutch 1.19 and solr 8.11.2. After running the crawl process, crawling finishes with an NullPointerException and the following Error message:

Error running: /opt/solr/apache-nutch-1.19/bin/nutch fetch -Dsolr.server.url=http//localhost:8983/solr/nutch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20230106121647 -threads 50 Failed with exit value 255.

Has anybody an idea what causes this error?

1

There are 1 best solutions below

0
Sebastian Nagel On

The error message indicates that the memory (Java heap) is not sufficient to spin up 50 fetcher threads. You could try the following:

  1. if you do not need the default number of 50 fetcher threads, reduce it by passing the option --num-threads n_threads to bin/crawl
  2. the Java heap size can be set via the environment variable NUTCH_HEAPSIZE - the default is 4 MB which should be sufficient even with 50 threads unless you have very large documents (eg. PDF files) to parse and index.
  3. there might be limits on your system which require to use less memory or threads