Randomisation behaviour after cloning and fitting RandomizedSearchCV

Question

Randomisation behaviour after cloning and fitting RandomizedSearchCV

30 Views Asked by Muhammed Yunus At 30 January 2024 at 19:53

I have a basic nested CV loop, where an outer loop goes over an inner model-tuning step. My expectation is that each fold should draw a different random sample of hyperparameter values. However, in the example below, each fold ends up sampling the same values.

Imports and make dataset:

from sklearn.model_selection import RandomizedSearchCV, KFold, cross_validate
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.base import clone

from scipy.stats import uniform
import numpy as np

X, y = make_classification(n_features=10, random_state=np.random.RandomState(0))

Nested CV loop:

#Used for tuning the random forest:
rf_tuner = RandomizedSearchCV(
    RandomForestClassifier(random_state=np.random.RandomState(0)),
    param_distributions=dict(min_samples_split=uniform(0.1, 0.9)),
    n_iter=5,
    cv=KFold(n_splits=2, shuffle=False),
    random_state=np.random.RandomState(0),
    n_jobs=1,
)

#Nested CV
for trn_idx, tst_idx in KFold(3).split(X, y):
    #'cloned' will now share the same RNG as 'rf_tuner'
    cloned = clone(rf_tuner)
    
    #This should be consuming the RNG of 'rf_tuner'
    cloned.fit(X[trn_idx], y[trn_idx])
    
    #Report hyperparameter values sampled in this fold
    display(cloned.cv_results_['params'])

    #<more code for nested CV, not shown>

Output:

Fold 1/3:
[{'min_samples_split': 0.593},
 {'min_samples_split': 0.743},
 {'min_samples_split': 0.642},
 {'min_samples_split': 0.590},
 {'min_samples_split': 0.481}]

Fold 2/3:
[{'min_samples_split': 0.593},
 {'min_samples_split': 0.743},
 {'min_samples_split': 0.642},
 {'min_samples_split': 0.590},
 {'min_samples_split': 0.481}]

Fold 3/3:
[{'min_samples_split': 0.593},
 {'min_samples_split': 0.743},
 {'min_samples_split': 0.642},
 {'min_samples_split': 0.590},
 {'min_samples_split': 0.481}]

I start by instantiating a RandomizedSearchCV with a RandomForestClassifier. I set the random_state= of the search to a random state instance np.random.RandomState(0).

For each pass of the outer loop, I clone() and fit() the search object - cloned should thus be using the same RNG as the original, mutating it at each pass. Each loop ought to yield a different sampling of hyperparameter values. However, as shown above, the hyperparameters sampled at each pass are identical. This suggests that each loop is starting with the same unmodified RNG rather than a mutated one.

The docs say that clones of estimators share the same random state instance:

b = clone(a) [...] calling a.fit will consume b’s RNG, and calling b.fit will consume a’s RNG, since they are the same

What explains the absence of randomisation between folds?

Original Q&A

There are 1 best solutions below

**Ben Reiniger** · Accepted Answer · 2024-01-31T00:05:16.213000

clone performs a deepcopy on each non-estimator parameter (source), and so in the case of a RandomState the clones will all have different RandomState objects all starting from the same state (in the sense of get_state()). So your example is expected.

I don't know offhand if this used to behave differently, or if the documentation has always been wrong in this point.

Randomisation behaviour after cloning and fitting RandomizedSearchCV

There are 1 best solutions below

Related Questions in SCIKIT-LEARN

Related Questions in CROSS-VALIDATION

Related Questions in RANDOM-SEED

Trending Questions

Popular # Hahtags

Popular Questions