How to control parallelization of LinearSVC in pyspark?

17 Views Asked by Дмитрий At 14 June 2023 at 08:44

I want to control and reduce training time of my Big Linear Model on 100GB dataset. Is there a way to regulate parallelization scale of LinearSVC model in pyspark.mllib ? Is there some analogue of DataFrame.repartition(numPartitions) parameter ?

I expect to find a single parameter or method, which will allow LinearSVC model to occupy more cluster resourses and finish training faster. Observing spark job log, i've found that at some internal steps of training my data is being repartitioned from 1000-10000 partitions to 100 constantly. I believe there is some way to control parallelization. LinearSVC has two parameters aggregationDepth and maxBlockSizeInMB that can play a role i think, but spark documentation lacks of information.

Original Q&A

How to control parallelization of LinearSVC in pyspark?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-MLLIB

Trending Questions

Popular # Hahtags

Popular Questions