Is weightcol of spark random forest classifier used directly in impurity calculation?

40 Views Asked by Zhenyu Zhang At 25 August 2023 at 20:06

To my knowledge, in sklearn sample weights will be incorporated into the impurity formula. Take binary classification and gini impurity as an example:

With sample weights, p_0 will be calculated as:

However, looking into the source code of spark ml, I found the sample weights seem not to be used in calculating class probability. It's only used after the split to reweight the impurities of left and right node for the total impurity. As a result, a highly weighted positive example will not increase the postive probability, instead it only adds to total weight of a node. I'm not sure if my observation is right or wrong, so here to look for some expert clarify this.

Original Q&A

Is weightcol of spark random forest classifier used directly in impurity calculation?

There are 0 best solutions below

Related Questions in SCIKIT-LEARN

Related Questions in RANDOM-FOREST

Related Questions in DECISION-TREE

Related Questions in APACHE-SPARK-MLLIB

Trending Questions

Popular # Hahtags

Popular Questions