I am currently experimenting with the LibSVM format as a standardized format for exchanging label/feature data sets between Python and Java in a Spark project. However, I am a bit confused by the multiple files starting with 'part-000*' that are created when saving the data (originally in Pandas DF, converted to RDD and LabeledPoints) using Spark's MLUtil.util.saveAsLibSVMFile().
Why is the data split across multiple files and how can I save it to a single text file?
Or, alternatively, how can I read these multiple 'part-0000*' files?
AFAICS, the method loadLibSVMFile() in Spark's MLUtils.util requires a single file, which is strange; saveAsLibSVMFile() in the same util module will produce multiple files. Why this inconsistency?