LibSVM: Understanding the data format

293 Views Asked by Martin Wunderlich At 27 September 2021 at 07:35

I am currently experimenting with the LibSVM format as a standardized format for exchanging label/feature data sets between Python and Java in a Spark project. However, I am a bit confused by the multiple files starting with 'part-000*' that are created when saving the data (originally in Pandas DF, converted to RDD and LabeledPoints) using Spark's MLUtil.util.saveAsLibSVMFile().

Why is the data split across multiple files and how can I save it to a single text file?
Or, alternatively, how can I read these multiple 'part-0000*' files?

AFAICS, the method loadLibSVMFile() in Spark's MLUtils.util requires a single file, which is strange; saveAsLibSVMFile() in the same util module will produce multiple files. Why this inconsistency?

Original Q&A

LibSVM: Understanding the data format

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-MLLIB

Related Questions in LIBSVM

Related Questions in FILE-FORMAT

Trending Questions

Popular # Hahtags

Popular Questions