Apache Mahout Vs Apache Spark in local mode with nutch data

185 Views Asked by At

I already have nutch/solr application in single mode. I'm spouse to try integrating Mahout or spark to achieve kinda of personlized results. But I'm still a lot far from that point.

With lack of knowledge, time, and resources is there a fast and effective way to use one tool with Nutch's crawled.db or solr indexed data to represent personlization as a proof of concept?

I'm open to any idea.

Regards

1

There are 1 best solutions below

3
rawkintrevo On

Considering you are saying Spark vs Mahout- I think you are thinking of "old" MR based Mahout, which has been deprecated and moved to "community support".

I would recommend you use Mahout Samsara, which is a Spark library. E.g. my answer is you should use Mahout and Spark. For local mode though, you can just use Mahout Vectors / Matrices.

The question is vague, but based on the tags, I think this tutorial might be a good place to start, as it uses Mahout and Solr for a recommendation engine.

http://mahout.apache.org/docs/latest/tutorials/cco-lastfm/

Disclaimer: I'm a PMC of the Apache Mahout project.