I am evaluating spark with marklogic database. I have read a csv file, now i have a JavaRDD object which i have to dump into marklogic database.
    SparkConf conf = new SparkConf().setAppName("org.sparkexample.Dataload").setMaster("local");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> data = sc.textFile("/root/ml/workArea/data.csv");
    SQLContext sqlContext = new SQLContext(sc);
    JavaRDD<Record> rdd_records = data.map(
      new Function<String, Record>() {
          public Record call(String line) throws Exception {
             String[] fields = line.split(",");
             Record sd = new Record(fields[0], fields[1], fields[2], fields[3],fields[4]);
             return sd;
          }
    });
This JavaRDD object i want to write to marklogic database.
Is there any spark api available for faster writing to the marklogic database ?
Lets say, If we could not write JavaRDD directly to marklogic then what is the currect approach to achieve this ?
Here is the code which i am using to write the JavaRDD data to marklogic database, let me know if it is wrong way to do that.
final DatabaseClient client = DatabaseClientFactory.newClient("localhost",8070, "MLTest");
    final XMLDocumentManager docMgr = client.newXMLDocumentManager();   
    rdd_records.foreachPartition(new VoidFunction<Iterator<Record>>() {
        public void call(Iterator<Record> partitionOfRecords) {
            while (partitionOfRecords.hasNext()) {
                Record record = partitionOfRecords.next();
                System.out.println("partitionOfRecords - "+record.toString());
                String docId = "/example/"+record.getID()+".xml";
                JAXBContext context = JAXBContext.newInstance(Record.class);
                JAXBHandle<Record> handle = new JAXBHandle<Record>(context);
                handle.set(record);
                docMgr.writeAs(docId, handle);
            }
      }
    });
    client.release();
I have used java client api to write the data, but i am getting below exception even though POJO class Record is implementing Serializable interface. Please let me know what could be the reason & how to solve that.
org.apache.spark.sparkexception task not Serializable .
                        
Modified example from spark streaming guide, Here you will have to implement connection and writing logic specific to database.