I am trying to get a nutch crawler connected to an existing solrcloud install. I've already tested using a normal http type connection and know it works. i.e an entry in the index-writers.xml that looks like this:
<parameters>
<param name="type" value="http"/>
<param name="url" value="http://localhost:8983/solr/nutch"/>
<param name="weight.field" value=""/>
<param name="commitSize" value="1000"/>
<param name="auth" value="false"/>
</parameters>
Referencing the nutch IndexWriters documentation, it looks like I should just be able to change a few values to get nutch talking to solrcloud. Namely, type -> cloud, url -> zookeeper connection string and collection -> my collection name. As such, I've produced the following config which does not work:
<parameters>
<param name="type" value="cloud"/>
<param name="url" value="<some_ip>:2181,<some_ip>:2181,<some_ip>:2181/some_name"/>
<param name="collection" value="nutch"/>
<param name="weight.field" value=""/>
<param name="commitSize" value="1000"/>
<param name="auth" value="false"/>
</parameters>
The value of url comes directly from my solrcloud admin panel where it provides a zookeeper connection string. However, the nutch docs for the url parameter state:
Defines the fully qualified URL of Solr into which data should be indexed. Multiple URL can be provided using comma as a delimiter. When the value of type property is cloud, the URL should not include any collections or cores; just the root Solr path.
I assume that multiple urls must be for solrcloud but I can't find any examples of what the url parameter should look like for a solrcloud configuration. What is the proper format for a solrcloud url in my index-writers.xml config?