What is the correct format for a solrcloud url in Nutch's index-writers.xml config?

26 Views Asked by At

I am trying to get a nutch crawler connected to an existing solrcloud install. I've already tested using a normal http type connection and know it works. i.e an entry in the index-writers.xml that looks like this:

   <parameters>
      <param name="type" value="http"/>
      <param name="url" value="http://localhost:8983/solr/nutch"/>
      <param name="weight.field" value=""/>
      <param name="commitSize" value="1000"/>
      <param name="auth" value="false"/>
    </parameters>

Referencing the nutch IndexWriters documentation, it looks like I should just be able to change a few values to get nutch talking to solrcloud. Namely, type -> cloud, url -> zookeeper connection string and collection -> my collection name. As such, I've produced the following config which does not work:

    <parameters>
      <param name="type" value="cloud"/>
      <param name="url" value="<some_ip>:2181,<some_ip>:2181,<some_ip>:2181/some_name"/>
      <param name="collection" value="nutch"/>
      <param name="weight.field" value=""/>
      <param name="commitSize" value="1000"/>
      <param name="auth" value="false"/>
    </parameters>

The value of url comes directly from my solrcloud admin panel where it provides a zookeeper connection string. However, the nutch docs for the url parameter state:

Defines the fully qualified URL of Solr into which data should be indexed. Multiple URL can be provided using comma as a delimiter. When the value of type property is cloud, the URL should not include any collections or cores; just the root Solr path.

I assume that multiple urls must be for solrcloud but I can't find any examples of what the url parameter should look like for a solrcloud configuration. What is the proper format for a solrcloud url in my index-writers.xml config?

0

There are 0 best solutions below