Using fit_transform to the train set also applies fit_transform to validation set?

24 Views Asked by At

Doesn't splitting the whole dataset into training set and test set result to the validation set also undergoing whatever preprocessing steps the training set went through? My understanding is that, ideally continuous features should be scaled like:

#standardization of continuous features
num_ct = ColumnTransformer([('standardize', StandardScaler(), numerical)])
X_train = num_ct.fit_transform(X_train)
X_val = num_ct.transform(X_val)
X_test = num_ct.transform(X_test)

But suppose I did:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state = random_state)

#standardization of continuous features

num_ct = ColumnTransformer([('standardize', StandardScaler(), numerical)])
X_train = num_ct.fit_transform(X_train)
X_test = num_ct.transform(X_test)

for a neural network and used Skorch like so:

net = NeuralNet(
    module=MyDNN,
    ...,
    train_split = 0.2,
)

Doesn't this mean that I included the 20% validation set to the fit_transform I did earlier?

0

There are 0 best solutions below